gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6.09k stars 1.83k forks source link

Use CSV output format #14

Closed AlexanderYastrebov closed 8 months ago

AlexanderYastrebov commented 8 months ago

Use proper CSV (or semicolon-separated for that matter) output format

Addis Ababa;33.0;33.0;33.0
Aden;31.1;31.1;31.1,
...

instead of weird not-a-json oneliner output.

This would simplify result comparison (diff) between multiple implementations. This would also enable parsing of results - think of import into database or e.g. database-based implementation.

gunnarmorling commented 8 months ago

I agree that this would have been the better way. I'm hesitating though to change the format at this point, to keep different implementations comparable.

lluismf commented 8 months ago

The weird not-a-json oneliner is just the map serialized. Writing a CSV output would be extra cost.

AlexanderYastrebov commented 8 months ago

@lluismf You are right, serializing result in a portable format is a game-changer and will severely affect processing performance of 10^9 input rows :+1:

gunnarmorling commented 8 months ago

Hey, let's keep it friendly :)

I agree that a different output format wouldn't make any difference perf-wise in the grand scheme of things and as said, it would have been the better choice. But I don't think it's that much of an issue to justify changing this while the challenge is running.

So I'd keep this as-is for the time being, and consider it a lesson learned for whenever another challenge of this kind is happening. Thanks all!