Open ArlindKadra opened 6 years ago
@janvanrijn
Thanks for reporting.
I did an update on several of the xsd schema's, including the one you mentioned. Should be fixed. Let me know if any other problems popped up.
Fast-forward
.../pages/api_new/v1/xsd/openml.data.features.xsd | 20 ++++-
.../pages/api_new/v1/xsd/openml.data.qualities.xsd | 9 ++-
.../pages/api_new/v1/xsd/openml.data.upload.xsd | 17 +++--
.../v1/xsd/openml.implementation.upload.xsd | 30 ++++++--
.../pages/api_new/v1/xsd/openml.run.trace.xsd | 9 ++-
.../pages/api_new/v1/xsd/openml.run.upload.xsd | 2 +-
.../api_new/v1/xsd/openml.task.types.search.xsd | 87 ++++++++++++----------
Hey @janvanrijn , It is unfortunately still not working for the dataset that I am trying to upload. https://github.com/openml/OpenML/blob/7a1e4cfb96d58c5d20b4438e1c1102024dfd442b/openml_OS/views/pages/api_new/v1/xsd/openml.data.upload.xsd#L24
I think the license element validation is failing because the value contains spaces. So maybe we should add \s in the regex pattern.
Also the description element validation is failing. https://github.com/openml/OpenML/blob/7a1e4cfb96d58c5d20b4438e1c1102024dfd442b/openml_OS/views/pages/api_new/v1/xsd/openml.data.upload.xsd#L17
The description contains these characters
=, :, -, ^, /, ",
Maybe we should use a different encoding ?
We can make the license field basic Latin 64. Do you think the description problem can be fixed with a different encoding?
For the license field, we can do whatever you think is best. For the description problem, on a second thought, I have to look it up more as the characters might be contained in the set.
should be better now?
Hey @janvanrijn , The license element is ok now, however the description is failing because it is longer than the max value of 1024. The description contains 5023 characters and it contains a lot of whitespace characters. The number of non-whitespace characters is 3504. How should we deal with this one ? Should we just increase the limit ? Ps. It's the dataset BreastCancer from scikit-learn.
I extended it
Should the ( and ) be preceded by \ since they are meta-characters used for grouping ? I am getting a validation error while trying to upload a dataset from scikit-learn. As an example, the license value 'BSD (from Scikit-learn)' does not pass.