Some steps are supposed to drop rows and we shouldn't put 3000 "DROPPED ROW" messages in the logs.
E.g. the Boston bike traffic count pipeline filters out pedestrian data rows and that's not anything that requires a log message at all, or if anything a summary of how many non-bike rows were filtered out.
It shouldn't matter if the logic to drop rows was implemented as a row_step, batch_step or dataframe_step - they all could be useful for dropping unneeded rows.
To make this work for all three types, we should have it as an optional parameter on each step decorator
Some steps are supposed to drop rows and we shouldn't put 3000 "DROPPED ROW" messages in the logs.
E.g. the Boston bike traffic count pipeline filters out pedestrian data rows and that's not anything that requires a log message at all, or if anything a summary of how many non-bike rows were filtered out.
It shouldn't matter if the logic to drop rows was implemented as a row_step, batch_step or dataframe_step - they all could be useful for dropping unneeded rows.
To make this work for all three types, we should have it as an optional parameter on each step decorator