openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
669 stars 91 forks source link

Data Edit API does not allow multiple ignored attributes #1117

Open PGijsbers opened 3 years ago

PGijsbers commented 3 years ago

When editing ignore_attribute field with the data edit API with multiple attributes to ignore, only the last one is used:

import openml

new_id = openml.datasets.fork_dataset(41702)  # new ID was: 43070
openml.datasets.edit_dataset(new_id, ignore_attribute=["instance_id", "repetition", "runstatus"])

I verified that the Python API sends the following xml (indentation for readability):

<?xml version="1.0" encoding="utf-8"?>
<oml:data_edit_parameters xmlns:oml="http://openml.org/openml">
  <oml:ignore_attribute>instance_id</oml:ignore_attribute>
  <oml:ignore_attribute>repetition</oml:ignore_attribute>
  <oml:ignore_attribute>runstatus</oml:ignore_attribute>
</oml:data_edit_parameters>

however when obtaining the dataset XML after the edit:

<oml:data_set_description>
  <oml:id>43070</oml:id>
  <oml:name>MIP-2016-regression</oml:name>
  <oml:version>2</oml:version>
  <oml:description/>
  <oml:description_version/>
  <oml:format>ARFF</oml:format>
  <oml:upload_date>2019-05-28T13:09:37</oml:upload_date>
  <oml:licence>Public</oml:licence>
  <oml:url>
    https://www.openml.org/data/v1/download/21377444/MIP-2016-regression.arff
  </oml:url>
  <oml:file_id>21377444</oml:file_id>
  <oml:default_target_attribute>PAR10</oml:default_target_attribute>
  <oml:ignore_attribute>runstatus</oml:ignore_attribute>
  <oml:citation>[1] NA</oml:citation>
  <oml:visibility>public</oml:visibility>
  <oml:minio_url>
    http://openml1.win.tue.nl/dataset43070/dataset_43070.pq
  </oml:minio_url>
  <oml:status>active</oml:status>
  <oml:processing_date>2021-07-28 16:48:03</oml:processing_date>
  <oml:md5_checksum>91a74d1d325765e61530ebf8c2ba3263</oml:md5_checksum>
</oml:data_set_description>

changing the edit call again (e.g. to ["instance_id", "repetition"]) does change the dataset XML, so it's not a matter of waiting for updates. It consistently only saves the last field in the list.