ipums / hlink

Hierarchical record linkage at scale
Mozilla Public License 2.0
12 stars 2 forks source link

Document the concat_two_cols column mapping transform #125

Closed riley-harper closed 12 months ago

riley-harper commented 1 year ago

This is a transform in hlink/linking/core/transforms.py. It uses pyspark.sql.functions.concat to concatenate two columns, which means that it can handle strings, arrays, binary, and numerics. When we document this, it would be good to look into how it handles each of these different cases and the combinations of them. We're probably most interested in how it works with strings and numbers.