Closed GoogleCodeExporter closed 8 years ago
It seems "oe1.train.arff" contains an old Mac style of denoting enf-of-line,
which is '\r' a.k.a. the carriage return without a trailing newline character,
'\n'. My program always assumes that there exists '\n' at the end of the line,
with or without preceding '\r', which naturally results in incorrect parsing of
the file should '\r' exist alone.
The funny thing is, this '\r' character without a companion '\n' appears ONLY
once in the file, which to me makes it look very non-standard way to format a
file in the first place. This is how I believe the machine sees the problematic
part:
...
@data\r\n
\r
1513,...\r\n
...
So '\r\n' at the end is treated as end-of-line, as well as just '\n' alone
would be, but '\r' is unfortunately not. Extending the parser to account for
the missing case is going to take some more time, unfortunately, but I will get
it fixed sooner or later.
If you want an easy fix to this, you can manually tweak the file and remove the
problematic bit; I did it and it works just fine after that. Cheers!
Original comment by timo.erk...@gmail.com
on 24 Mar 2012 at 10:09
As long as the newline character sequence is either \r\n or \n, no problems
will occur. In the future more extensive set of newline sequences may be
supported.
Original comment by timo.erk...@gmail.com
on 25 May 2012 at 4:32
Original issue reported on code.google.com by
berni.le...@gmail.com
on 22 Mar 2012 at 6:52Attachments: