vincentlaucsb / csv-parser

A high-performance, fully-featured CSV parser and serializer for modern C++.
MIT License
901 stars 150 forks source link

csv writer is confused about floating point/doubles #146

Closed berthubert closed 3 years ago

berthubert commented 3 years ago

Fist let me say I am very happy with csv-parser! It is helping me process tens of thousands of bacterial genomes.

This issue is about "to_string() for floating point numbers" in the CSV writer. It appears that this code is making an attempt to write a floating point number in a locale independent way, but is getting it wrong. This was messing up my genome plots.

In short, the code multiplies the non-integral part of a float by 10000 and then to_strings this. This however breaks smaller numbers than 0.000001, plus it omits leading zeroes.

I have attempted the correct C++17 to_chars solution, but this newfangled stuff does not appear to have been implemented widely or correctly. I replaced the to_string() with this:

  /** to_string() for floating point numbers */
        template<
            typename T,
            csv::enable_if_t<std::is_floating_point<T>::value, int> = 0
        >
        inline std::string to_string(T value) {
      std::string ret = std::to_string(value);
      if(auto pos = ret.find(','); pos != std::string::npos)
        ret[pos]='.';
      return ret;
        }

This too is awful, but it at least does not emit incorrect data. If you want I can drag up better code, but at least here is an issue already.

vincentlaucsb commented 3 years ago

Hello Bert, and thank you for your report. I am sorry for taking several months to respond, but I am glad this library is being put to good use.

I have to admit, when I was coding this method I was obsessed with creating numeric strings with the least CPU cycles possible. Although I know that some users of this library are scientific users, I did not consider the level of precision that might be demanded.

I will definitely consider a solution that would be more appropriate for high precision work and/or allow the user to customize the serialization logic.

vincentlaucsb commented 3 years ago

Hopefully fixed by #179

berthubert commented 3 years ago

thanks!

berthubert commented 3 years ago

Vincent, I updated to the latest version and I found that on my gcc 9.3.0, all negative double values now get stored as positive values. This cost me hours of work since the bug seems so unlikely. This is version 2.1.3. I tried to look into the change history to see what happened but I mostly find things about CLANG and disabled tests.