Test cases around the 1 million data input

cribl-jlin commented 2 years ago

I see that you have cases around other test inputs such as no file, special character, etc, however, do you have test cases around the default 1 million data input?

wonkas-factory commented 2 years ago

The file was unaltered but just renamed to largeOneMillionEventsTest.log https://raw.githubusercontent.com/wonkas-factory/cribl-splitter/main/inputs/largeOneMillionEventsTest.log

cribl-jlin commented 2 years ago

Sorry, maybe I didnt phrase it correctly, what I meant was, are there more test cases involving with the one million events test input.

wonkas-factory commented 2 years ago

@cribl-jlin Oh I see. During development I did want to add an extra test case where running the 1M file twice to see if the outputs were the same. They weren't so I ended up just remove the test. Thinking about it now, I should have just asserted the opposite as a check that the async behavior does introduce randomness in the outputs.

Nevertheless the basicVerification method does 3 things 1) Verify the absolute content from input and output match 2) Checks the the output files are relatively balanced in size 3) The number of corrupt lines/packets is less than a threshold This was all done in one test instead of 3 since the docker environment takes about 40ish seconds to run.

Additionally I mentioned in the readme - assumptions section point 2 and future considerations point 1 that I could possibly try to recreate the log with an appropriate buffer size. This I felt like it was more reverse engineering the application vs testing it when I found a more efficient way to check that exact input vs output. It really depended on the what is interpreted from "Validates if data received on the ‘Target’ nodes are correct" that could be clarified by a dev or product if it was a real project. At the end of the day though, I time boxed this assignment and tried to clearly state my assumptions and future considerations.

Let me know if there are any other clarifications needed.

wonkas-factory commented 2 years ago

@cribl-jlin added one additional test to verify the randomness of the async output of the targets as it was relatively straight forward. https://github.com/wonkas-factory/cribl-splitter/commit/20df2ba6997bc347c248e8c1450df061402117f7

wonkas-factory / cribl-splitter

Test cases around the 1 million data input #2