mlr-org / farff

a faster arff parser
Other
11 stars 6 forks source link

Some OpenML datasets can't be parsed #23

Closed larskotthoff closed 8 years ago

larskotthoff commented 8 years ago

I get

Joaquin says:

Alright, after some experiments I found that the problem goes away if I remove the features with have more than 15000 (nominal) values.

Maybe farff raises an internal error when it encounters such cases and skips them, and hence the feature count won't match, which would explain the error we see.

It happens for 1111,1112 and 1114.

jakobbossek commented 8 years ago

We do preprocessing in C. There we skip the header lines to move on to the @data section. The line buffer reserved to save the skipped lines is too low for the large lines in the reported arff files. Going to fix this.

jakobbossek commented 8 years ago

Fixed.