Closed abhinav-upadhyay closed 5 months ago
So my GitHub username is abhinav-upadhyay
. But a file name cannot contain -
. The CI job is failing to find the script to run my program. Is there any alternative to get around it?
So my GitHub username is
abhinav-upadhyay
. But a file name cannot contain-
. The CI job is failing to find the script to run my program. Is there any alternative to get around it?
Realized, I can have file names with -
. Just need to use quotes when referring to them on the shell.
Fails with the 10K key set test (see create_measurements_3.sh):
+ timeout -v 300 ./test.sh abhinav-upadhyay measurements_10K_1B.txt
Validating calculate_average_abhinav-upadhyay.sh -- measurements_10K_1B.txt
410c410
< Fort Porta;-21.8;9.3;42.0
---
> Fort Porta;-21.8;8.6;38.0
617c617
< Ki;-17.4;15.5;47.4
---
> Ki;-17.4;17.6;47.4
1020c1020
< Roch;-22.0;14.3;46.6
---
> Roch;-11.6;18.9;48.0
...
Still incorrect results for the 10K keyset. Note that we're after the cut-off time, so you'll have two more changes you can make to this PR (see note at the top of the README). If it's not working or valid then, I'll have to close it unfortunately. Updates should be pushed quickly, so I can evaluate all the pending entries. Thx!
Hey @abhinav-upadhyay, I still believe this does not handle hash collisions correctly. In case of a collision, the actual names of the existing and the incoming entry must be compared. I can't find where this is happening? You can push one more time to this PR before it'll either be merged (if valid) or closed (if invalid). Thx!
Hey @abhinav-upadhyay, I still believe this does not handle hash collisions correctly. In case of a collision, the actual names of the existing and the incoming entry must be compared. I can't find where this is happening? You can push one more time to this PR before it'll either be merged (if valid) or closed (if invalid). Thx!
I am building the hash using all the bytes of the location name (lines 180-198 in readFile method). This hash is stored into a 64 bit long value. In the test cases and even in case of the 10k keys file (which contained some huge names), it is not resulting in any cases where two names have the same hash. Because of this, I have not added any checks for comparing the names themselves .
Understood, but absence of collisions in specific key sets doesn't mean that there cannot be collisions at all. As per the rules, you cannot make any assumptions on specific names:
Q: Can I make assumptions on the names of the weather stations showing up in the data set? A: No, while only a fixed set of station names is used by the data set generator, any solution should work with arbitrary UTF-8 station names (for the sake of simplicity, names are guaranteed to contain no ; or \n characters).
So, in order for the entry to be considered valid, it must fall back to comparing names in case of collisions.
I am going to close this one as I haven't heard back within two days and we are after the cut-off time. Thanks for your efforts nevertheless!
Check List:
[ ] You have run
./mvnw verify
and the project builds successfully[ ] Tests pass (
./test.sh <username>
shows no differences between expected and actual outputs)[ ] All formatting changes by the build are committed
[ ] Your launch script is named
calculate_average_<username>.sh
(make sure to match casing of your GH user name) and is executable[ ] Output matches that of
calculate_average_baseline.sh
[ ] For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number): Line 62 (I did not find any collisions in the 1 billion file or the test case, so removed the handling).
Execution time: 3.70 seconds
Execution time of reference implementation: 3 min 33 seconds