Open talevy42 opened 1 year ago
Thanks for the issue! This does seem strange. A very basic example works as expected:
table = Table([{'x': 1}, {'x': 0}, {'x': 2}])
table.add_column('halfx', lambda r: 2 / r['x'])
table.materialize() # Should raise a ZeroDivisionError
But it does seem like lazily deferring the calculation should be done in a smarter way. In particular things like removing, renaming, etc the column that don't require doing the calculation shouldn't raise the error.
Here is another example that I find counterintuitive. I'd expect that after my except
block, the table would have reverted to the original state it was in, like you suggested. In fact the final print statement errors again:
from parsons import Table
table = Table([{"x": 1}, {"x": 0}, {"x": 2}])
print(table)
try:
table.add_column("two_div_x", lambda r: 2 / r["x"])
table.materialize()
except Exception as e:
print(str(e))
print(table) # This raises an error, even though it seems like it shouldn't
The problem is that petl
creates an onion of transformations, so even when remove_column
runs cut()
the cut only renders a result that stops showing (but apparently, still runs/renders) the "removed" columns.
In order to fix this, we would have to pretty much rewrite at least a LOT of class ETL
to stop the onion-ifying, which might not even be backwards compatible. Even if we tried to be as clinical as possible and only rewrote, say add_column
, remove_column
, fill_column
, and a couple others, subsequent or earlier onion wrappers could lose coherence -- e.g. if I do an add_column
then a move_column
and then a remove_column
, suddenly the move-transformation would fail because we truly removed the column.
I suppose something could 'traverse the onion' and look for all transformations that refer to a specific column, but then we would essentially be implementing insert/delete functions on a linked-list (e.g. traversing we'd see that the MoveFieldView.field relates to a column we're removing, and so then have to 'move' the MoveFieldView.table to the .table
value of the subsequent/higher-up wrapper View.
If we want to be clinical, we could allow fixing only right-after-add_column, and then remove_column
would have some logic that tests if self.table is of type AddFieldView with a .field == column_to_remove
then we just pop self.table to self.table.source.sources (lol, because it's double-wrapped in a StackView())
(we could also consider adding an undo()
command that pops the last transformation after the last materialize() was called)
Detailed Description
When you call
add_column
and pass in a function that may throw an exception on edge case logic, the column is not evaluated until the Table is materialized, utilizing lazy loading. This is fine, but it causes issues if you try to reuse or correct the column's logic, as now any new operation on the Table, including attempts toremove
and re-add
the failing column, throw that exception. Because of the lazy loading, the Parsons table believes it has successfully added the column.Ideally, when adding a particular column throws an exception, Parsons should fail gracefully and simply not add the column, or else remove the column before throwing the exception.
To Reproduce
Your Environment