Open MichaelMMeskhi opened 3 years ago
Could you please give some details for which dataset(s) this happens?
Could you please give some details for which dataset(s) this happens?
Please see the attached list for did
that I found so far.
They all seem to be from one source FOREX
trading data.
did 41764
name FOREX_gbpusd-day-High
version 1
uploader 1
status active
format arff
MajorityClassSize 937
MaxNominalAttDistinctValues 2
MinorityClassSize 897
NumberOfClasses 2
NumberOfFeatures 12
NumberOfInstances 1834
NumberOfInstancesWithMissingValues 0
NumberOfMissingValues 0
NumberOfNumericFeatures 11
Hey, the issue here is that this data set contains fields of type 'date', which are not supported by the arff parser in python. There's an open PR to support that (https://github.com/renatopp/liac-arff/pull/67), but it's gone stale. We'd be happy if you like to pick that up.
@mfeurer I will look into it and try to see what I can do about it. Thanks for the feedback!
Hi all, is there any progress on this issue?
Yes/No.
Yes: Since 0.12.0
the get_dataset
call should no longer raise the error because the data is not actually loaded anymore with that call. This means you get access to the dataset object and metadata.
No: The ARFF parser still does not support the data type in the ARFF file. As soon as you actually try to load the data (e.g. OpenMLDataset.get_data()
the same error is thrown.
To me it makes the most sense to wait until the dataset is available in parquet
format, since that should hopefully work without issues (and if not, it's worthwhile to improve the parquet support).
Hi, I'm a master's student under supervision of @joaquinvanschoren.
As a temporary fix, you could convert the timestamps to a Unix timestamp format and hint to ARFF that it is a numeric type.
I've made some quick adjustments in the decode_arff
function that does exactly that, check it out at: https://github.com/chclam/openml-python/commit/136c27940b3cb9974e8272bae67007b4e1be5dc8
Description
When using
dataset = openml.datasets.get_dataset(did)
, aBad @ATTRIBUTE
is thrown.Steps/Code to Reproduce
Expected Results
No errors thrown.
Actual Results
Versions