mlr-org / farff

a faster arff parser
Other
11 stars 6 forks source link

parseFactorLevels is slow #6

Open berndbischl opened 9 years ago

berndbischl commented 9 years ago

this consumes regexp by regexp in R. this is extremely slow, but only for header parsing. best to do this in C.

berndbischl commented 9 years ago

NB: I tried VERY hard to write a faster version in R without the regexp-by-regexp consuming for levels. I failed, due to quoting hell.

jakobbossek commented 8 years ago

In a valid arff file the factor levels are separated by commas and wrapped in curly braces, e.g., {level1, level2, ..., leveln}. Instead of parsing this by hand regexp by regexp we could simply split the string by comma and process all matches afterwards, right? I do not see any disadvantage of this approach.