Closed larskotthoff closed 8 years ago
We do preprocessing in C. There we skip the header lines to move on to the @data
section.
The line buffer reserved to save the skipped lines is too low for the large lines in the reported arff files. Going to fix this.
Fixed.
I get
2 -- 1 columns 231 columns 3 -- 1 columns 231 columns 4 -- 1 columns 231 columns 5 -- 1 columns 231 columns ... ... ......... ........... .See problems(...) for more details. Error in
colnames<-
(*tmp*
, value = header$col.names) : 'names' attribute [231] must be the same length as the vector [1] Calls: readARFF -> colnames<- In addition: Warning message: Unnamedcol_types
should have the same length ascol_names
. Using smaller of the two. Execution haltedJoaquin says:
Alright, after some experiments I found that the problem goes away if I remove the features with have more than 15000 (nominal) values.
Maybe farff raises an internal error when it encounters such cases and skips them, and hence the feature count won't match, which would explain the error we see.
It happens for 1111,1112 and 1114.