openml / openml-r

R package to interface with OpenML
http://openml.github.io/openml-r/
Other
96 stars 37 forks source link

get OML Dataset with no target feature #207

Closed rgmantovani closed 8 years ago

rgmantovani commented 8 years ago

Hi guys,

is there any way to get a dataset with no target feature? Or should I add a "default" target @joaquinvanschoren ? There are at least 6 OML datasets without a target feature specified, but the corresponding task has it.

ids = c(1468, 1484, 1566, 1479, 1514, 1515)

I took a look on the tasks, and all of them have more than 100 features.

> getOMLDataSet(did = 1484)
Error in getOMLDataSet(did = 1484) : 
  Assertion on 'target.features' failed: Must have length >= 1, but has length 0
> getOMLDataSet(did = 1566)
Error in getOMLDataSet(did = 1566) : 
  Assertion on 'target.features' failed: Must have length >= 1, but has length 0
> getOMLDataSet(did = 1479)
Error in getOMLDataSet(did = 1479) : 
  Assertion on 'target.features' failed: Must have length >= 1, but has length 0
> getOMLDataSet(did = 1514)
Error in getOMLDataSet(did = 1514) : 
  Assertion on 'target.features' failed: Must have length >= 1, but has length 0
> getOMLDataSet(did = 1515)
Error in getOMLDataSet(did = 1515) : 
  Assertion on 'target.features' failed: Must have length >= 1, but has length 0
> getOMLDataSet(did = 1468)
Error in getOMLDataSet(did = 1468) : 
  Assertion on 'target.features' failed: Must have length >= 1, but has length 0
> getOMLDataSet(did = 6)

Data Set "letter" :: (Version = 1, OpenML ID = 6)
  Default Target Attribute: class

My sessionInfo( );

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] OpenML_1.0       mlr_2.8          ParamHelpers_1.7 BBmisc_1.9      
[5] gridExtra_2.2.1  reshape2_1.4.1   ggplot2_2.1.0   

loaded via a namespace (and not attached):
 [1] parallelMap_1.3    Rcpp_0.12.3        plyr_1.8.3         tools_3.2.3       
 [5] RWekajars_3.7.12-1 digest_0.6.9       memoise_1.0.0      gtable_0.2.0      
 [9] checkmate_1.7.3    shiny_0.13.1       DBI_0.3.1          curl_0.9.6        
[13] parallel_3.2.3     rJava_0.9-8        stringr_1.0.0      dplyr_0.4.3       
[17] httr_1.1.0         xml2_0.1.2         ggvis_0.4.2        grid_3.2.3        
[21] data.table_1.9.6   R6_2.1.2           XML_3.98-1.4       survival_2.38-3   
[25] RWeka_0.4-24       magrittr_1.5       backports_1.0.1    scales_0.4.0      
[29] htmltools_0.3      splines_3.2.3      assertthat_0.1     mime_0.4          
[33] xtable_1.8-2       colorspace_1.2-6   httpuv_1.3.3       stringi_1.0-1     
[37] munsell_0.4.3      chron_2.3-47      

Thanks.

giuseppec commented 8 years ago

Hm, good point. I don't see any reason to throw an error and avoid reading the dataset. Why is it forbidden to get the data when there is no default target defined? Wouldn't a warning be sufficient instead of an error? Let's wait what @joaquinvanschoren and @berndbischl think.

berndbischl commented 8 years ago

That is just an error in the assertion. The dataset should of course be downloadable to R with 0 target features.

Pls fix this and add a unit test

joaquinvanschoren commented 8 years ago

Also agree that a dataset without default target should be downloadable. This is actually very common for clustering datasets. On Thu, 24 Mar 2016 at 21:28, Bernd Bischl notifications@github.com wrote:

That is just an error in the assertion. The dataset should of course be downloadable to R with 0 target features.

Pls fix this and add a unit test

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/openml/openml-r/issues/207#issuecomment-201005272