openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
668 stars 91 forks source link

dataset 70,71,73 is invalid ARFF? #216

Closed giuseppec closed 8 years ago

giuseppec commented 9 years ago

The ARFF files with ids 70, 71, 73 (maybe there are some more) seem wrong. Here the direct link to one of these datasets: http://www.openml.org/data/download/1716/BayesianNetworkGenerator_anneal_small.arff . In the header there is one line which causes an error with all arff reader that are available in R. The suspicious line is: @attribute carbon {'\'B1of3\'','\'B2of3\'','\'B3of3\''} is this a valid ARFF format?

giuseppec commented 9 years ago

PING.

related issue: E.g. http://www.openml.org/api_splits/get/7529/Task_7529_splits.arff is non valid arff format (see also https://github.com/openml/website/issues/25)

jakobbossek commented 8 years ago

The example seems to be a valid list of nominal values in the arff format according to the developer description of the file format. However, only the farff package fails here. RWeka::read.arff is able to read this arff file.

giuseppec commented 8 years ago

farff seems now to be able to read those data sets.

joaquinvanschoren commented 8 years ago

Is it reading them or just skipping them? On Tue, 1 Mar 2016 at 14:13, giuseppec notifications@github.com wrote:

Closed #216 https://github.com/openml/OpenML/issues/216.

— Reply to this email directly or view it on GitHub https://github.com/openml/OpenML/issues/216#event-573005061.

giuseppec commented 8 years ago

It is reading them. For both farff and RWeka, the R data.frame is equivalent (exept for did = 73, where I get a java OutOfMemoryError when using RWeka, but this is not our problem and farff is still able to read it).

joaquinvanschoren commented 8 years ago

Awesome :)

On Tue, Mar 1, 2016 at 2:40 PM giuseppec notifications@github.com wrote:

It is reading them. For both farff and RWeka, the R data.frame is equivalent (exept for did = 73, where I get a java OutOfMemoryError when using RWeka, but this is not our problem and farff is still able to read it).

— Reply to this email directly or view it on GitHub https://github.com/openml/OpenML/issues/216#issuecomment-190727622.