Open PGijsbers opened 2 years ago
A couple of remarks:
booolean
was not available in arff
I thinkAlso it would be great if the types string
and categorical
were distinguishable from looking at the data features (if this information is somehow updated using the new parquet files)
Also it seems that the parquet urls from the test server are wrong. With wrong I mean that they point to the parquet urls of the publis server.
Edit: more info
Also it would be great if the types string and categorical were distinguishable from looking at the data features (if this information is somehow updated using the new parquet files)
Do you have an example? We want to see if this issue was with the ARFF file or specifically introduced in the conversion.
Also it seems that the parquet urls from the test server are wrong.
Parquet URLS from the test server have been disabled for now, until we have a separate minio (or bucket) for the test server.
https://github.com/openml/OpenML/issues/1165 this is kind of a weird bug
We'll look into that, and for the conversion scripts we'll have a closer look to preserve the feature data, or encode it into correct data types where ARFF was previously not expressive enough (e.g., boolean, 8-bit integers).
This was a confusion from my side, sorry!
The following will be changed for the conversion script:
uint8
.Additionally feature meta-data needs to be updated:
This is a centralised discussion about the server side changes (being) made to the datasets in their conversion from ARFF to Parquet. Related on-going discussions that reference the server state of different datasets:
Let's keep the relevant information about the migration as it relates to server data in this thread. This is not for connector specific discussions (for example, how
openml-python
handles this). @joaquinvanschoren @prabhant @sebffischer