openml / openml-r

R package to interface with OpenML
http://openml.github.io/openml-r/
Other
95 stars 37 forks source link

removing row identifiers seems broken. #436

Closed pfistfl closed 5 years ago

pfistfl commented 5 years ago

Example: Task 3954: MagicTelescope The .xml of the acompanying dataset has:

<oml:row_id_attribute>ID</oml:row_id_attribute>

But:

does not drop the ID column.

tsk2 = getOMLTask(3954)
colnames(getTaskData(convertOMLTaskToMlr(tsk2)$mlr.task))
[1] "ID"        "fLength."  "fWidth."   "fSize."    "fConc."    "fConc1."   "fAsym."    "fM3Long."  "fM3Trans." "fAlpha."   "fDist."    "class."  
pfistfl commented 5 years ago

Reason seems to be that <oml:row_id_attribute>ID</oml:row_id_attribute> is parsed and unioned (in getOMLDataSet.R L:97) with /oml:data_set_description/oml:ignore_attribute which is missing.

This then breaks convertOMLDataSettoMlr L: 62

if (!is.na(desc$ignore.attribute) && ignore.flagged.attributes)

Not really sure what the intended behaviour would be so I don't really know how to fix this without breaking other stuff downstream.

giuseppec commented 5 years ago

could you check if this works? pease reopen if not.