mattcasters / pentaho-pdi-dataset

Set of PDI plugins to more easily work with data sets. We also want to provide unit testing capabilities through input data sets and golden data sets.
Apache License 2.0
30 stars 13 forks source link

Bug? - Test must be performed on a step AFTER any step that emits a new field. #43

Closed usbrandon closed 5 years ago

usbrandon commented 5 years ago

If you look at kettle-qa/unitTests/registeredDefects/PDI-17034_db_checksum/ut_PDI-17034_db_checksum.ktr, I had to perform the test on the Logging Step. My expectation was that if I was testing the "Checksum Step" then presumably it's testing the output rows of that step. However, it appears that tests are performed upon the input rows of a step because I got failures when trying to test on the very step doing the work.

I suspect this is not what was intended; but I do not know if this should be considered a bug or just the way it is. The effect of keeping the way it is would be standard operating procedure of adding a dummy step after any logic to be tested.

My philosophy is that less is more. I would vote for being able to test the output as opposed to adding boilerplate afterwards.

mattcasters commented 5 years ago

I made the logic simple in the sense that steps with an Input data set are replaced by an Injector step and steps with a Golden data set are replaced with a Dummy. The main reason for doing this is allowing steps like Table Output and Text File Output to be tested without actually writing something. On the input side obviously we don't really want to read from the original source.
I can see your point but I think that there will always be drawbacks somewhere.
Perhaps adding an option in the unit test could help us out. Replacing the step with a dummy isn't a requirement after all. Making this explicit in the unit test dialog might bring solace. Thoughts?