Closed viliam-durina closed 10 months ago
I've added short test cases #277 which royvanrijn
passes. Maybe you could provide a test case to show the problem?
This is also covered in an existing test case: https://github.com/gunnarmorling/1brc/blob/fa1ca65bfd608fb117181de07320d8b568d30f12/src/test/resources/samples/measurements-complex-utf8.txt#L15-L16
My fault, I didn't actually run the code at all. I only assumed that the hash can be calculated differently for the same city name, and that it must lead to incorrect results.
Today I debugged and indeed two entries can be created for the same city in MeasurementRepository
. But it's resolved in the final combine step when the String city
field of MeasurementRepository.Entry
is used as the key for the TreeMap
, and this field is calculated correctly, so the end results are correct.
https://github.com/gunnarmorling/1brc/blob/fa1ca65bfd608fb117181de07320d8b568d30f12/src/main/java/dev/morling/onebrc/CalculateAverage_royvanrijn.java#L145
@royvanrijn
This code in the (currently) best solution will give incorrect results if there happen to be two semicolons in a single
word
. This is possible since a minimal record is 6 bytes, e.g.: