terkkila / rf-ace

Automatically exported from code.google.com/p/rf-ace
0 stars 0 forks source link

some .arff files can't be parsed #22

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
rf-ace-build-predictor-win64.exe -I oe1.train.arff -i class -O all.test.model -R

What is the expected output? What do you see instead?
Reading file 'oe1.train.arff', please wait... datadefs::str2num: ERROR: paramete
1513' could not be read properly. Quitting...
Assertion failed: false, file src\datadefs.cpp, line 168

What version of the product are you using? On what operating system?
WIN7, v1.0.3_win7_x64

Please provide any additional information below.
I attached a quite similar file (oe1.test.arff) which works fine. I already 
reduced both files to make it easier to track down the problem. The only 
difference I'm aware of is that the file that failes to be loaded was generated 
through appending operations via WEKA.
greetings,
Berni

Original issue reported on code.google.com by berni.le...@gmail.com on 22 Mar 2012 at 6:52

Attachments:

GoogleCodeExporter commented 9 years ago
It seems "oe1.train.arff" contains an old Mac style of denoting enf-of-line, 
which is '\r' a.k.a. the carriage return without a trailing newline character, 
'\n'. My program always assumes that there exists '\n' at the end of the line, 
with or without preceding '\r', which naturally results in incorrect parsing of 
the file should '\r' exist alone.   

The funny thing is, this '\r' character without a companion '\n' appears ONLY 
once in the file, which to me makes it look very non-standard way to format a 
file in the first place. This is how I believe the machine sees the problematic 
part:

...
@data\r\n
\r
1513,...\r\n
...

So '\r\n' at the end is treated as end-of-line, as well as just '\n' alone 
would be, but '\r' is unfortunately not. Extending the parser to account for 
the missing case is going to take some more time, unfortunately, but I will get 
it fixed sooner or later. 

If you want an easy fix to this, you can manually tweak the file and remove the 
problematic bit; I did it and it works just fine after that. Cheers!

Original comment by timo.erk...@gmail.com on 24 Mar 2012 at 10:09

GoogleCodeExporter commented 9 years ago
As long as the newline character sequence is either \r\n or \n, no problems 
will occur. In the future more extensive set of newline sequences may be 
supported.

Original comment by timo.erk...@gmail.com on 25 May 2012 at 4:32