harrelfe / Hmisc

Harrell Miscellaneous
Other
208 stars 81 forks source link

asNumericMatrix/matrix2dataFrame not respecting integer columns #74

Closed AndreMikulec closed 6 years ago

AndreMikulec commented 7 years ago

asNumericMatrix transforms integer columns into numerics ( and marks them as ischar = FALSE ). matrix2dataFrame only looks for ischar and "factor" and thus does not restore original integer columns back to integer columns ( Those are left as numeric ).

Is this 'not processing integer columns' by 'design' or is this an oversight?

Note, looking at the code, the case seems that boolean columns, also would not be converted to/from boolean: those would also be left as numeric.

Below, see column 'price' (integer).

> library(Hmisc)
> data("diamonds",package="ggplot2")
> str(diamonds)
Classes 'tbl_df', 'tbl' and 'data.frame':       53940 obs. of  10 variables:
 $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int  326 326 327 334 335 336 336 337 337 338 ...  *** INTEGER ***
 $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
> str(Hmisc::asNumericMatrix(diamonds))
 num [1:53940, 1:10] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:53940] "1" "2" "3" "4" ...
  ..$ : chr [1:10] "carat" "cut" "color" "clarity" ...
 - attr(*, "origAttributes")=List of 10
  ..$ carat  :List of 1
  .. ..$ ischar: logi FALSE
  ..$ cut    :List of 3
  .. ..$ class : chr [1:2] "ordered" "factor"
  .. ..$ levels: chr [1:5] "Fair" "Good" "Very Good" "Premium" ...
  .. ..$ ischar: logi FALSE
  ..$ color  :List of 3
  .. ..$ class : chr [1:2] "ordered" "factor"
  .. ..$ levels: chr [1:7] "D" "E" "F" "G" ...
  .. ..$ ischar: logi FALSE
  ..$ clarity:List of 3
  .. ..$ class : chr [1:2] "ordered" "factor"
  .. ..$ levels: chr [1:8] "I1" "SI2" "SI1" "VS2" ...
  .. ..$ ischar: logi FALSE
  ..$ depth  :List of 1
  .. ..$ ischar: logi FALSE
  ..$ table  :List of 1
  .. ..$ ischar: logi FALSE
  ..$ price  :List of 1
  .. ..$ ischar: logi FALSE    *** INTEGER ***
  ..$ x      :List of 1
  .. ..$ ischar: logi FALSE
  ..$ y      :List of 1
  .. ..$ ischar: logi FALSE
  ..$ z      :List of 1
  .. ..$ ischar: logi FALSE
> str(Hmisc::matrix2dataFrame(Hmisc::asNumericMatrix(diamonds)))
'data.frame':   53940 obs. of  10 variables:
 $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
 $ price  : num  326 326 327 334 335 336 336 337 337 338 ...  ***  SHOULD HAVE MADE?: INTEGER ***
 $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
> attr(Hmisc::asNumericMatrix(diamonds), "origAttributes")$price
$ischar
[1] FALSE
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Hmisc_4.0-3     ggplot2_2.2.1   Formula_1.2-1   survival_2.41-3
[5] lattice_0.20-35

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11        knitr_1.16          magrittr_1.5
 [4] cluster_2.0.6       splines_3.4.0       munsell_0.4.3
 [7] colorspace_1.3-2    rlang_0.1.1         stringr_1.2.0
[10] plyr_1.8.4          tools_3.4.0         nnet_7.3-12
[13] grid_3.4.0          data.table_1.10.4   htmlTable_1.9
[16] checkmate_1.8.2     gtable_0.2.0        latticeExtra_0.6-28
[19] htmltools_0.3.6     digest_0.6.12       lazyeval_0.2.0
[22] tibble_1.3.3        Matrix_1.2-10       gridExtra_2.2.1
[25] RColorBrewer_1.1-2  base64enc_0.1-3     htmlwidgets_0.8
[28] acepack_1.4.1       rpart_4.1-11        stringi_1.1.5
[31] compiler_3.4.0      scales_0.4.1        backports_1.1.0
[34] foreign_0.8-68
>
harrelfe commented 6 years ago

Sorry for the very late response. This is fixed for the upcoming release to CRAN. All storage.modes will be respected.