openml / openml-r

R package to interface with OpenML
http://openml.github.io/openml-r/
Other
95 stars 37 forks source link

Getting some errors with getOMLDataSet and uploadOMLTask #452

Closed thengl closed 2 years ago

thengl commented 4 years ago

Have been getting some errors while trying to access and upload some datasets/tasks. Here is a code:

> library(openML)
> setOMLConfig(apikey = "...")
> x = getOMLDataSet(data.id = 42332)
Downloading from 'http://www.openml.org/api/v1/data/42332' to '/data/RTMP/RtmpLRYbou/cache/datasets/42332/description.xml'.
Downloading from 'https://www.openml.org/data/v1/download/21800122/SoilKsatDB.arff' to '/data/RTMP/RtmpLRYbou/cache/datasets/42332/dataset.arff'
Loading required package: readr
Warning: 13072 parsing failures.
row col  expected     actual                                     file
  2  -- 1 columns 38 columns '/data/RTMP/RtmpLRYbou/filec21f4fd245e0'
  3  -- 1 columns 38 columns '/data/RTMP/RtmpLRYbou/filec21f4fd245e0'
  4  -- 1 columns 38 columns '/data/RTMP/RtmpLRYbou/filec21f4fd245e0'
  5  -- 1 columns 38 columns '/data/RTMP/RtmpLRYbou/filec21f4fd245e0'
  6  -- 1 columns 38 columns '/data/RTMP/RtmpLRYbou/filec21f4fd245e0'
... ... ......... .......... ........................................
See problems(...) for more details.

Error in names(x) <- value : 
  'names' attribute [38] must be the same length as the vector [1]
In addition: Warning message:
Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two. 
> uploadOMLTask("Supervised Regression", data.id = 42332, "ksat_lab", estimation.procedure="10-fold Crossvalidation", evaluation.measure="correlation_coefficient", description="ksat_lab regression", confirm.upload=FALSE)
Uploading task to server.
Uploading to 'http://www.openml.org/api/v1/task'.
Error in doHTTRCall(method, url = url, query = list(api_key = conf$apikey),  : 
  ERROR (code = 613) in server response: Problem validating uploaded description file
  XML does not correspond to XSD schema. Error Element '{http://openml.org/openml}evaluation_measures': This element is not expected. Expected is one of ( {http://openml.org/openml}input, {http://openml.org/openml}tag ).
 on line 6 column 0.

My session info:

> devtools::session_info()
Session info -------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, linux-gnu           
 ui       RStudio (1.2.1335)          
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       Europe/Berlin               
 date     2020-03-19  
Packages -----------------------------------------------------------------------------------
 farff           1.1       2020-03-18 Github (mlr-org/farff@15d6a60)     
 OpenML        * 1.10      2020-03-19 Github (openml/openml-r@04fa150)   
 openxlsx        4.1.4     2019-12-06 CRAN (R 3.5.1)                     

Tried also making new tasks using the web-interface also no success.

ledell commented 2 years ago

I ran the code above and got the same error (also using 1.10, latest version on CRAN), so this is not fixed yet. I am getting similar errors when trying to query a task:

> t <- getOMLTask(task.id = 359993)
Task '359993' file 'task.xml' found in cache.
Task '359993' file 'datasplits.arff' found in cache.
Data '42734' file 'description.xml' found in cache.
Data '42734' file 'dataset.arff' found in cache.
Error in names(x) <- value :                                                                                              
  'names' attribute [20] must be the same length as the vector [1]
In addition: Warning message:
sebffischer commented 2 years ago

The problem is the farff parser. Changing to the (slower) RWeka reader works fine. Furthermore you can use the mlr3oml package as well.

library(OpenML)
setOMLConfig(arff.reader = "RWeka")
#> OpenML configuration:
#>   server           : http://www.openml.org/api/v1
#>   cachedir         : /tmp/RtmpmEaof6/cache
#>   verbosity        : 1
#>   arff.reader      : RWeka
#>   confirm.upload   : TRUE
#>   apikey           :
task = getOMLTask(359993)
#> Downloading from 'http://www.openml.org/api/v1/task/359993' to '/tmp/RtmpmEaof6/cache/tasks/359993/task.xml'.
#> Downloading from 'https://www.openml.org/api_splits/get/359993/Task_359993_splits.arff' to '/tmp/RtmpmEaof6/cache/tasks/359993/datasplits.arff'
#> Downloading from 'http://www.openml.org/api/v1/data/42734' to '/tmp/RtmpmEaof6/cache/datasets/42734/description.xml'.
#> Downloading from 'https://www.openml.org/data/v1/download/22044770/okcupid-stem.arff' to '/tmp/RtmpmEaof6/cache/datasets/42734/dataset.arff'

Created on 2022-02-23 by the reprex package (v2.0.1)