Closed lisad closed 7 months ago
After discussion , we think we should add error policy to the pipeline RUN instructions - the CLI and when the pipeline is run directly.
We'll remove error policy from phases until we learn what the use cases are and whether it's too annoying to solve those use cases with the existing error policy support (on steps and columns)
How flexible should we be in allowing phases to declare their own error policy?
E.g. the Validator phase could have its error policy be ON_ERROR_DROP_ROW, but the Transformer phase could have its error policy be ON_ERROR_COLLECT or ON_ERROR_STOP_NOW.
We probably do want the CLI to be able to define an error policy - and would this override the error policy for every phase? This would allow somebody to re-run the pipeline somewhere that their code had already been deployed, but in a slightly different way (e.g. change to WARN or DROP_ROW so they can get through most of the data and go back and fix things for the exceptions)
In making this work once we decide how, if we decide that phase classes can declare their own error policy where the phase is defined, that will need to be fixed - right now it's ignored. TO make it work, the PhaseBase constructor will need to check self.class.error_policy, and that will need to be given a default value and probably a test would be good. Unless we decide that error_policy can exist at the pipeline level (and set via CLI) only in which case only do the CLI work.