lisad / phaser

library for batch-oriented complex data integration pipelines
MIT License
3 stars 1 forks source link

Have a way to suppress drop-row messages by step #123

Closed lisad closed 1 month ago

lisad commented 1 month ago

Some steps are supposed to drop rows and we shouldn't put 3000 "DROPPED ROW" messages in the logs.

E.g. the Boston bike traffic count pipeline filters out pedestrian data rows and that's not anything that requires a log message at all, or if anything a summary of how many non-bike rows were filtered out.

It shouldn't matter if the logic to drop rows was implemented as a row_step, batch_step or dataframe_step - they all could be useful for dropping unneeded rows.

To make this work for all three types, we should have it as an optional parameter on each step decorator

lisad commented 1 month ago

Added the "check_size=False" flag to @batch_step and @dataframe_step to solve this