Open berndbischl opened 9 years ago
NB: I tried VERY hard to write a faster version in R without the regexp-by-regexp consuming for levels. I failed, due to quoting hell.
In a valid arff file the factor levels are separated by commas and wrapped in curly braces, e.g., {level1, level2, ..., leveln}
. Instead of parsing this by hand regexp by regexp we could simply split the string by comma and process all matches afterwards, right? I do not see any disadvantage of this approach.
this consumes regexp by regexp in R. this is extremely slow, but only for header parsing. best to do this in C.