Closed parkertimmins closed 9 months ago
Hey, this one produces an incorrect output for the 10K keyset test (see _createmeasurements3.sh). While that's not the official challenge, I'd like to make sure that at least the top entries pass that, so as to make sure they don't cut any corners. Could you take a look? Thx!
Hey, sorry about that! Turns out there was a pretty significant bug. There was some padding being used in the logic between batches. It was only being used at the end of the batches, but should also have been used the the beginning. For this reason around 200 characters were being skipped between batches. Or about 20k characters over all 100 batches used. Due to this bug, my previous results are invalid.
As an aside, it's pretty surprising that this passed the regular tests and the full dataset. Without digging in too deep, I believe it passed the tests because they were small enough that one batch was used. But it got correct results on the full dataset because the errors were rounded away. And none of the missing rows where min or max values. On the 10k values data set, there were fewer samples per station, thus errors in the average calculation were larger and were not rounded away.
All looking good now, 00:04.800 for the official eval key set.
As an aside, it's pretty surprising that this passed the regular tests and the full dataset.
Yepp, I think your analysis on why it slipped through tests is spot on. The test suite could definitely be improved, the current state is a function of how much effort folks could put into it so far, runtime of the tests, etc. I think it's ok for the time being, should there be another challenge, I'll definitely invest the time to develop a very tight TCK.
Check List:
[x] Tests pass (
./test.sh <username>
shows no differences between expected and actual outputs)[x] All formatting changes by the build are committed
[x] Your launch script is named
calculate_average_<username>.sh
(make sure to match casing of your GH user name) and is executable[x] Output matches that of
calculate_average_baseline.sh
[x] For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions: line 91 calls a simd string equality function
Execution time: 0m28,277s
Execution time of reference implementation: 4m44,720s
System: Intel i7-5600U CPU @ 2.60GHz, 4 cores, 8GB RAM
Main changes: