Open PGijsbers opened 3 years ago
When editing ignore_attribute field with the data edit API with multiple attributes to ignore, only the last one is used:
ignore_attribute
import openml new_id = openml.datasets.fork_dataset(41702) # new ID was: 43070 openml.datasets.edit_dataset(new_id, ignore_attribute=["instance_id", "repetition", "runstatus"])
I verified that the Python API sends the following xml (indentation for readability):
<?xml version="1.0" encoding="utf-8"?> <oml:data_edit_parameters xmlns:oml="http://openml.org/openml"> <oml:ignore_attribute>instance_id</oml:ignore_attribute> <oml:ignore_attribute>repetition</oml:ignore_attribute> <oml:ignore_attribute>runstatus</oml:ignore_attribute> </oml:data_edit_parameters>
however when obtaining the dataset XML after the edit:
<oml:data_set_description> <oml:id>43070</oml:id> <oml:name>MIP-2016-regression</oml:name> <oml:version>2</oml:version> <oml:description/> <oml:description_version/> <oml:format>ARFF</oml:format> <oml:upload_date>2019-05-28T13:09:37</oml:upload_date> <oml:licence>Public</oml:licence> <oml:url> https://www.openml.org/data/v1/download/21377444/MIP-2016-regression.arff </oml:url> <oml:file_id>21377444</oml:file_id> <oml:default_target_attribute>PAR10</oml:default_target_attribute> <oml:ignore_attribute>runstatus</oml:ignore_attribute> <oml:citation>[1] NA</oml:citation> <oml:visibility>public</oml:visibility> <oml:minio_url> http://openml1.win.tue.nl/dataset43070/dataset_43070.pq </oml:minio_url> <oml:status>active</oml:status> <oml:processing_date>2021-07-28 16:48:03</oml:processing_date> <oml:md5_checksum>91a74d1d325765e61530ebf8c2ba3263</oml:md5_checksum> </oml:data_set_description>
changing the edit call again (e.g. to ["instance_id", "repetition"]) does change the dataset XML, so it's not a matter of waiting for updates. It consistently only saves the last field in the list.
["instance_id", "repetition"]
When editing
ignore_attribute
field with the data edit API with multiple attributes to ignore, only the last one is used:I verified that the Python API sends the following xml (indentation for readability):
however when obtaining the dataset XML after the edit:
changing the edit call again (e.g. to
["instance_id", "repetition"]
) does change the dataset XML, so it's not a matter of waiting for updates. It consistently only saves the last field in the list.