lisad / phaser

The missing layer for complex data batch integration pipelines
MIT License
4 stars 1 forks source link

Clean up returned tuple where step wrapper passes "check_size" back to phase with results #133

Open lisad opened 4 months ago

lisad commented 4 months ago

I added this return tuple, although I don't like it, because the phase is the place that has the context and the previous number of rows of the main dataset. However, the step wrapper is the place where the step's extra settings are known, e.g.:

@batch_step(check_size=False) def my_batch_step(batch, context) return batch

When the phase gets this back it gets 'result, check_size' in order to know whether to check the size or if the step is supposed to resize.

Could we remove this return tuple? The phase passes a context into the step function call, which translates to a call to _step_wrapper. In that function, if you pass context into self.postprocess, then we ALMOST have all the information in the postprocess function. The only thing missing is the previous row size. Also we would need to duplicate the size checks in both batch_step and dataframe_step but that can be factored out as a function so not a big deal.