Closed GoogleCodeExporter closed 8 years ago
Thanks for the comment but I deliberately decided to not support this feature.
The main reason was that the "\r\n", "\n" and "\n\r" mess. If the string
contains one of these should some translation be done?
If you say no, then this will lead to nasty to find bugs as some people
implicitly expect this.
If you say yes then this will lead to a number of problems. For example the
translation "\n" -> "\r\n" makes the string longer. However in the memory
buffer no additional space is available. We would therefore need to copy the
string to do the translation. This is against the
only-pay-for-what-I-use-spirit.
Original comment by strasser...@gmail.com
on 27 Oct 2014 at 8:48
I think the argumentation for no is a bit lacking. I'm in favor of not doing
anything to them, just get me the data. The newline processing is left up to
the programmer, just as UTF-8 and whatnot. The library is really nice, and this
feature would make it be able to parse every library. It would at least be nice
to have the option to do so, in that way it's just another policy, right?
Original comment by n...@astrant.net
on 27 Oct 2014 at 11:00
There are further problems. Suppose for simplicity that all linebreaks are plain
and simple \n. The interpretation of a \n will vary depending on whether it is
between quotes or not. This requires that the method that breaks the file into
lines knows about quotes, because otherwise it can not do its job.
The current design allows the user to specify arbitrary quoting styles by just
providing a function to unquote a cell. You can not extract from such a black
box function what the quoting style is and thus the feature can not be
implemented without breaking the interface and making the library quoting aware
at the lowest level. It gets especially challenging if you do not want to make
the case where no cells can be quoted any slower.
Another problem is how to handle runaway quotes. Now you need to discern the
case of a long line and a missing quote. This gets non-trivial (but solvable)
if you consider files that are large and that you therefore do not want to
completely load into memory. You can not just read until the end of the file
because that implies loading the whole file into the current line.
And what is the gain? The idea of CSV is to have an easy to read way of storing
a database table. If you allow linebreaks within cells then your readability
goes down the drain. IMO non-escaped linebreaks within strings are a bug that
needs fixing on the user side and not on the library side.
If you have inherited CSV files that happen to have linebreaks in cells then you
can always replace the corresponding linebreaks with some escape character
before feeding the file to the library.
Thanks for the input, but it was a deliberate decision from the start, that this
"feature" is not worth supporting.
Original comment by strasser...@gmail.com
on 29 Oct 2014 at 9:53
Original issue reported on code.google.com by
n...@astrant.net
on 26 Oct 2014 at 5:35