Closed FlorianPargent closed 6 years ago
just to make it clear, it also does not work when I change the uploaded dataset by uncommenting the originally intended line: dat$TARGET_B = factor(dat$TARGET_B)
Does this also happen if you use the ARFF reader from RWeka instead of farff? And could you please try another ARFF reader, e.g. the read.arff
function from the foreign
package?
The data set is so big that it takes to long on my laptop to do a "quick check" here.
And: What happens if you upload just a smaller subset of the data? Could you try this out (without deleting the data sets you upload so that I could check this quicker)
41290 is the big version while 41291 is a small version with only 10000 rows. Both fail with readARFF. Will try the other functions next.
Ok sorry, it seems to be a problem with farff as read.arff in the foreign package works also for the big one.
Will close this here since using RWeka instead of farff with the OpenML package seems to work. Just reopen if you encounter any other problem or think that I should include the read.arff
function from the foreign
package as a third arff reader option in the OpenML package. But for now it seems to work if you do this:
setOMLConfig(arff.reader = "RWeka") # you can also set RWeka as default in your config file
d = getOMLDataSet(41291)
Still, it would be great to concretize what exactly causes farff
failing since farff
was designed to behave exactly like RWeka
. Could you please open an issue in the farff tracker for this issue? If I have time and understand the issue I could maybe make a fix on farff.
Otherwise you have to force Bernd to look at this (or try to fix it yourself and make a PR on farff, which is probably easier than forcing Bernd ;) ).
just for completeness: with the latest change in farff, this example works now.
I encountered this problem when trying to upload another version of the KDD98 dataset (id = 23513), in which the binary target is correctly coded as factor instead of numeric. Interestingly, downloading the dataset and uploading it again without changes works, but downloading the new dataset does not work as the parsing fails.
I get the following error:
To me, the arff files of both datasets look very similar but I am not an expert on that.
Also, I downloaded the arff and csv files from the OpenML homepage and tried to read them manually. It works for the csv but not for the arff.
My first intuition was that this might be related to my other issue here https://github.com/mlr-org/farff/issues/37, but I do not really see how...