move-coop / parsons

A python library of connectors for the progressive community.
Other
255 stars 125 forks source link

Refactor Table initialization to avoid iterating over every row #983

Closed austinweisgrau closed 4 months ago

austinweisgrau commented 5 months ago

The prior implementation for initializing a Parsons table involved a check of the truthiness of a Parsons Table's underlying petl Table, which involves iterating over every row of the table. For very long tables, this can take a very long time. One important reason for the of the petl library is to enable very long tables to be worked with quickly and easily in python, so the existing implementation is a problem for that.

These changes refactor how a parsons Table is initialized slightly to avoid this iteration and more robustly and performatively set & check for the validity of the underlying petl Table.

The result involves one bit of esoteric code (the use of a sentinal as a default argument) in order to allow existing behavior to remain the same, specifically that Table() should work but Table(None) should raise an Error. I think that Table(None) should work equivalently to Table() and would simplify the implementation somewhat, but that would be a breaking change so I will make that in a separate PR to the major-release branch.

See commit messages for additional details about implementation changes and reasoning.

Jason94 commented 4 months ago

What's the relationship between this PR and #984? It looks like that commit removes code that's added here? Could they be consolidated?

austinweisgrau commented 4 months ago

This one does not make any breaking changes, so it can be merged into main without a major release.

Currently Table(None) would raise an error, while Table() is okay. That's true on main and it's true on this branch.

PR #984 is one commit ahead of this branch, it's just one commit that makes a breaking change. It makes Table(None) work the same as Table().