gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6.3k stars 1.88k forks source link

Use CSV output format #14

Closed AlexanderYastrebov closed 10 months ago

AlexanderYastrebov commented 10 months ago

Use proper CSV (or semicolon-separated for that matter) output format

Addis Ababa;33.0;33.0;33.0
Aden;31.1;31.1;31.1,
...

instead of weird not-a-json oneliner output.

This would simplify result comparison (diff) between multiple implementations. This would also enable parsing of results - think of import into database or e.g. database-based implementation.

gunnarmorling commented 10 months ago

I agree that this would have been the better way. I'm hesitating though to change the format at this point, to keep different implementations comparable.

lluismf commented 10 months ago

The weird not-a-json oneliner is just the map serialized. Writing a CSV output would be extra cost.

AlexanderYastrebov commented 10 months ago

@lluismf You are right, serializing result in a portable format is a game-changer and will severely affect processing performance of 10^9 input rows :+1:

gunnarmorling commented 10 months ago

Hey, let's keep it friendly :)

I agree that a different output format wouldn't make any difference perf-wise in the grand scheme of things and as said, it would have been the better choice. But I don't think it's that much of an issue to justify changing this while the challenge is running.

So I'd keep this as-is for the time being, and consider it a lesson learned for whenever another challenge of this kind is happening. Thanks all!