Open GoogleCodeExporter opened 8 years ago
Interestingly I had issues uploading the file into this issue. I had to zip it
first. May not be related.
Original comment by hrov...@gmail.com
on 14 Nov 2010 at 6:36
[deleted comment]
The file is notable for having old-style Mac line endings: each line other than
the last is terminated with a CR (character 0x0D), rather than the Windows/IBM
CR/LF pair (0x0D 0x10) or Linux-style LF (0x10).
Importantly, LF is the character that C++ and Java refer to as \n. A line
terminated only with a CR, the \r character, is not detected as terminated.
Python automatically expands \n to \r\n on Windows, and to the common line
ending for the file if the file is loaded with the U ("universal") mode set, so
Python will automatically correct this if the file is loaded in "rU" mode (read
universal) and the \r line endings will be understood as \n.
Java BufferedReader treats \r as an end-of-line character, as do similar
functions in the Java standard library. If using a Scanner, however, \r must be
explicitly expressed as a delimiter or it won't treat it as one. I don't know
off the top of my head how Java handles line endings otherwise.
Python only performs line ending conversion if universal (U) is part of the
file mode, so watch out.
C++, naturally, provides no support for foreign line endings, so it is very
easy to shoot yourself in the face in such circumstances.
What scripts are trying to process this so-called blank file? A file with no
line breaks may function similarly to a blank file, since something trying to
interpret it as a TSV will identify only a header row.
Original comment by anorberg...@gtempaccount.com
on 30 Nov 2010 at 6:10
Yep, bug verified. Perl scripts expect LF characters to end a line, and Excel
for Mac uses CR.
Suggested fix: convert CR to LF when writing text files to the workspace. This
can be done fairly trivially with FilterInputStream.
Original comment by anorberg...@gtempaccount.com
on 1 Dec 2010 at 1:17
Original issue reported on code.google.com by
hrov...@gmail.com
on 14 Nov 2010 at 6:33