Open FlorianPargent opened 6 years ago
Unfortunately, the last fix does not seem to be enough:
> dat3 = readARFF("datafile1.arff")
Parse with reader=readr : datafile1.arff
Loading required package: readr
Warnung: 1 parsing failure.
row # A tibble: 1 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 1 NA 1 columns 759 columns '/var/folders/t5/8s0vv3w545v7x5j0_pqtc8wr0000gp/T//Rtmpcmkgef/file475535af75d5' file # A tibble: 1 x 5
header: 114.905000; preproc: 0.504000; data: 0.845000; postproc: 0.096000; total: 116.350000
Warnmeldungen:
1: Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.
2: In rbind(names(probs), probs_f) :
number of columns of result is not a multiple of vector length (arg 2)
> all.equal(dat1, dat3)
[1] "Attributes: < Names: 1 string mismatch >"
[2] "Attributes: < Length mismatch: comparison on first 2 components >"
[3] "Attributes: < Component 2: Modes: numeric, list >"
[4] "Attributes: < Component 2: Lengths: 2000000, 5 >"
[5] "Attributes: < Component 2: names for current but not for target >"
[6] "Attributes: < Component 2: Attributes: < Ziel ist NULL, aktuell ist list > >"
[7] "Attributes: < Component 2: target is numeric, current is tbl_df >"
[8] "Component “huge_factor”: Lengths: 2000000, 2000002"
[9] "Component “huge_factor”: Lengths (2000000, 2000002) differ (string compare on first 2000000)"
[10] "Component “huge_factor”: 'is.NA' value mismatch: 2 in current 0 in target"
Now, a dataframe is returned but the number of rows do not match. It seems like two empty rows are added at the beginning of the dataframe:
> dim(dat1)
[1] 2000000 1
> dim(dat3)
[1] 2000002 1
>
> head(dat1)
huge_factor
1 6GwiqtKZwCEVtO4wpTeqK58HKKsgMc
2 9jc6lV3by0tkHv8UUBtv1p30baKu6z
3 rpF65yg5DY3sHk5mnRbWKVHR03lA3S
4 8uZpJsDm7WI13zFYoUD6obcLeG0I1Z
5 KZti0i9paE3iB0umaC46x1pN3GPzQ7
6 7xfDZa1ug3we4cKNmE5p6JwUZwdmSg
>
> head(dat3)
huge_factor
1 <NA>
2 <NA>
3 6GwiqtKZwCEVtO4wpTeqK58HKKsgMc
4 9jc6lV3by0tkHv8UUBtv1p30baKu6z
5 rpF65yg5DY3sHk5mnRbWKVHR03lA3S
6 8uZpJsDm7WI13zFYoUD6obcLeG0I1Z
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.1 (2018-07-02)
system x86_64, darwin15.6.0
ui RStudio (1.1.456)
language (EN)
collate de_DE.UTF-8
tz Europe/Berlin
date 2018-11-20
Packages --------------------------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 CRAN (R 3.5.0)
backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
base * 3.5.1 2018-07-05 local
BBmisc 1.11 2018-11-07 Github (berndbischl/BBmisc@a5a4e45)
checkmate 1.8.5 2017-10-24 CRAN (R 3.5.0)
cli 1.0.1 2018-09-25 CRAN (R 3.5.0)
compiler 3.5.1 2018-07-05 local
crayon 1.3.4 2017-09-16 CRAN (R 3.5.0)
data.table 1.11.8 2018-09-30 CRAN (R 3.5.0)
datasets * 3.5.1 2018-07-05 local
devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
digest 0.6.18 2018-10-10 CRAN (R 3.5.0)
fansi 0.4.0 2018-10-05 CRAN (R 3.5.0)
farff * 1.0 2018-11-20 Github (mlr-org/farff@8221efb)
graphics * 3.5.1 2018-07-05 local
grDevices * 3.5.1 2018-07-05 local
hms 0.4.2 2018-03-10 CRAN (R 3.5.0)
memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
methods * 3.5.1 2018-07-05 local
pillar 1.3.0 2018-07-14 CRAN (R 3.5.0)
pkgconfig 2.0.2 2018-08-16 CRAN (R 3.5.0)
R6 2.3.0 2018-10-04 CRAN (R 3.5.0)
Rcpp 0.12.19 2018-10-01 CRAN (R 3.5.0)
readr * 1.1.1 2017-05-16 CRAN (R 3.5.0)
rlang 0.3.0.1 2018-10-25 cran (@0.3.0.1)
rstudioapi 0.8 2018-10-02 CRAN (R 3.5.0)
stats * 3.5.1 2018-07-05 local
stringi * 1.2.4 2018-07-20 CRAN (R 3.5.0)
tibble 1.4.2 2018-01-22 CRAN (R 3.5.0)
tools 3.5.1 2018-07-05 local
utf8 1.1.4 2018-05-24 CRAN (R 3.5.0)
utils * 3.5.1 2018-07-05 local
withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
yaml 2.2.0 2018-07-25 CRAN (R 3.5.0)
Sidenote: This leads to errors when working with OpenML which are hard to debug, as the dataset can be uploaded with the R-Interface without error but then the download fails (or in one case I had, seems to be caught in an infinite loop).