ipums / hlink

Hierarchical record linkage at scale
Mozilla Public License 2.0
12 stars 2 forks source link

Refactor core.transforms.generate_transforms() for ease of reading #148

Closed riley-harper closed 2 months ago

riley-harper commented 2 months ago

Closes #141.

This PR refactors the core.transforms.generate_transforms() function by pulling out several functions defined within this larger function. The previous format with nested functions was difficult to read because generate_transforms() was several hundred lines long and because some of the nested functions closed over variables in generate_transforms(). After the refactor, some of the functions are still pretty long, but the module is much more navigable than before.

I've pulled the nested functions out as private functions in the core.transforms module. The public API of this module should not change at all with this change, and there shouldn't be any functionality changes either.

During the refactor, I also discovered the skip attribute which is available to all feature selection transforms. I added some tests for it and added it to the documentation.