vincentlaucsb / csv-parser

A high-performance, fully-featured CSV parser and serializer for modern C++.
MIT License
901 stars 150 forks source link

Floating point output completely broken? #170

Closed Holt59 closed 3 years ago

Holt59 commented 3 years ago

The following:

#include <iostream>
#include <csv.hpp>

int main(int argc, char* argv[]) {
    auto writer = csv::make_csv_writer(std::cout);
    writer << std::vector<std::string>{"a", "b", "c"};
    writer << std::make_tuple(1, 0.004654, 45);
}

Outputs:

a,b,c
1,0.465,45

Is there a way to fix this quickly? Otherwise, this makes this library completely unusable...

Is there a reason not to use the standard to_string functions in the library?

Holt59 commented 3 years ago

For users having similar issues, you can fix it by overriding all to_string in csv::internals by std::string. If you're using the single include version, this can be done with:

#include <string>

namespace csv::internals {
    using std::to_string;
}
#include <csv .hpp>  // include after!
vincentlaucsb commented 3 years ago

The standard to_string() functions are very slow. I also don't see how this "makes this library completely unusable". Yes this is a bug, but you can retrieve the parsed string and run any conversion function you want on it.

Holt59 commented 3 years ago

The standard to_string() functions are very slow. I also don't see how this "makes this library completely unusable". Yes this is a bug, but you can retrieve the parsed string and run any conversion function you want on it.

Are there any benchmarks? I'm working with kind of large CSV and I did not notice any performance difference when using the fix I provided that switch to std::to_string. I did a quick benchmark and the to_string version is indeed slower.

This does not make the library "completely unusable", but it's a major bug, so it's very hard to justify using it in a concrete project. Also, this is not related to parsing the string but formatting it, I have no issue with CSVReader. I could do all the call to to_string manually but that's not very clean and I don't really see the benefits of using this if I have to do so.

Aside from the bug in this issue, I don't think your to_string version supports very large double (anything whose integral part does not fit a std::size_t). I can understand the "shortcut" to get performance improvements (if it was the reasoning behind it), but this should at least be documented somewhere.

vincentlaucsb commented 3 years ago

Bugs happen, even in "professionally developed" software. If I could only push code if I could guarantee it to be 100% bug free, I would never publish anything. You're asking a lot for software that is essentially provided for free.

vincentlaucsb commented 3 years ago

You do have a point about large doubles--that will be fixed.

vincentlaucsb commented 3 years ago

Should be fixed by #177. Arbitrarily large ints stored in doubles should be outputted correctly as well.

Holt59 commented 3 years ago

I did not mean to be harsh, it's just that this is a hard-to-detect bug that's very impacting - it took me hours to notice that the issue was not in my computation code but in the writer part.

Just to point-out, none of the tests you wrote here https://github.com/vincentlaucsb/csv-parser/blob/master/tests/test_write_csv.cpp#L13-L26 are actually checking the bug above (leading zeros in the decimal part), so maybe it'd be worth adding some (although I think the new code is correct, but better safe than sorry).

vincentlaucsb commented 3 years ago

You're right. I will reopen this so I can add the appropriate unit tests.