jacobbien / hierNet

A Lasso for Hierarchical Interactions
4 stars 3 forks source link

varimp working code #1

Closed jamesdalg closed 4 years ago

jamesdalg commented 4 years ago

I have posted this on stackoverflow, but I realize that there is no extant example code for the variable importance measures. When you wrote varimp, was there a test case you used? I'm not able to get it to work for my data, or for the example given in the Rd file:

> set.seed(12)
> x=matrix(rnorm(100*10),ncol=10)
> x=scale(x,TRUE,TRUE)
> y=x[,1]+2*x[,2]+ x[,1]*x[,2]+3*rnorm(100)
> newx=matrix(rnorm(100*10),ncol=10)
> fit=hierNet(x,y,lam=50)
GG converged in 31 iterations.
> yhat=predict(fit,newx)
> 
> fit=hierNet.path(x,y)
i,lam= 1 129.72
GG converged in 1 iterations.
i,lam= 2 101.8
GG converged in 20 iterations.
i,lam= 3 79.89
GG converged in 26 iterations.
i,lam= 4 62.69
GG converged in 26 iterations.
i,lam= 5 49.2
GG converged in 26 iterations.
i,lam= 6 38.61
GG converged in 48 iterations.
i,lam= 7 30.3
GG converged in 57 iterations.
i,lam= 8 23.78
GG converged in 62 iterations.
i,lam= 9 18.66
GG converged in 55 iterations.
i,lam= 10 14.64
GG converged in 63 iterations.
i,lam= 11 11.49
GG converged in 60 iterations.
i,lam= 12 9.02
GG converged in 119 iterations.
i,lam= 13 7.08
GG converged in 122 iterations.
i,lam= 14 5.55
GG converged in 139 iterations.
i,lam= 15 4.36
GG converged in 147 iterations.
i,lam= 16 3.42
GG converged in 200 iterations.
i,lam= 17 2.68
GG converged in 374 iterations.
i,lam= 18 2.11
GG converged in 291 iterations.
i,lam= 19 1.65
GG converged in 602 iterations.
i,lam= 20 1.3
GG converged in 657 iterations.
> yhat=predict(fit,newx)
> hierNet.varimp(fit,x,y)
1Error in fit$th[-j, -j] : incorrect number of dimensions
jamesdalg commented 4 years ago
> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hierNet_1.9                                        igraph_1.2.5                                       topGO_2.40.0                                      
 [4] SparseM_1.78                                       GO.db_3.11.4                                       graph_1.66.0                                      
 [7] ComplexHeatmap_2.4.2                               EnhancedVolcano_1.6.0                              ggrepel_0.8.2                                     
[10] gtools_3.8.2                                       fastmatch_1.1-0                                    STRINGdb_1.26.0                                   
[13] msigdbr_7.1.1                                      clusterProfiler_3.16.0                             palasso_0.0.7                                     
[16] dplyr_1.0.0                                        bigstatsr_1.2.3                                    RSQLite_2.2.0                                     
[19] TCGAbiolinks_2.16.0                                magrittr_1.5                                       RPMM_1.25                                         
[22] cluster_2.1.0                                      wateRmelon_1.32.0                                  illuminaio_0.30.0                                 
[25] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 ROC_1.64.0                                         lumi_2.40.0                                       
[28] methylumi_2.34.0                                   FDb.InfiniumMethylation.hg19_2.2.0                 org.Hs.eg.db_3.11.4                               
[31] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2            GenomicFeatures_1.40.0                             AnnotationDbi_1.50.0                              
[34] ggplot2_3.3.1                                      reshape2_1.4.4                                     scales_1.1.1                                      
[37] limma_3.44.1                                       DMRcate_2.2.1                                      minfi_1.34.0                                      
[40] bumphunter_1.30.0                                  locfit_1.5-9.4                                     iterators_1.0.12                                  
[43] foreach_1.5.0                                      Biostrings_2.56.0                                  XVector_0.28.0                                    
[46] SummarizedExperiment_1.18.1                        DelayedArray_0.14.0                                matrixStats_0.56.0                                
[49] Biobase_2.48.0                                     GenomicRanges_1.40.0                               GenomeInfoDb_1.24.0                               
[52] IRanges_2.22.2                                     S4Vectors_0.26.1                                   BiocGenerics_0.34.0                               

loaded via a namespace (and not attached):
  [1] graphlayouts_0.7.0            lattice_0.20-41               haven_2.3.1                   bigassertr_0.1.3              vctrs_0.3.0                  
  [6] mgcv_1.8-31                   beanplot_1.2                  DSS_2.36.0                    blob_1.2.1                    survival_3.1-12              
 [11] prodlim_2019.11.13            pathview_1.28.0               later_1.0.0                   DBI_1.1.0                     R.utils_2.9.2                
 [16] rappdirs_0.3.1                selectr_0.4-2                 gsubfn_0.7                    jpeg_0.1-8.1                  zlibbioc_1.34.0              
 [21] htmlwidgets_1.5.1             mvtnorm_1.1-0                 GlobalOptions_0.1.1           inline_0.3.15                 markdown_1.1                 
 [26] tidygraph_1.2.0               Rcpp_1.0.4.6                  readr_1.3.1                   KernSmooth_2.23-17            DT_0.13                      
 [31] promises_1.1.0                gdata_2.18.0                  Hmisc_4.4-0                   ShortRead_1.46.0              mnormt_2.0.0                 
 [36] digest_0.6.25                 png_0.1-7                     nor1mix_1.3-0                 scatterpie_0.1.4              cowplot_1.0.0                
 [41] DOSE_3.14.0                   nleqslv_3.3.2                 ggraph_2.0.3                  pkgconfig_2.0.3               DelayedMatrixStats_1.10.0    
 [46] dygraphs_1.1.1.6              gower_0.2.1                   rstan_2.19.3                  circlize_0.4.9                GetoptLong_0.1.8             
 [51] xfun_0.14                     zoo_1.8-8                     tidyselect_1.1.0              matrixTests_0.1.9             purrr_0.3.4                  
 [56] EDASeq_2.22.0                 viridisLite_0.3.0             rtracklayer_1.48.0            pkgbuild_1.0.8                rlang_0.4.6                  
 [61] glue_1.4.1                    ensembldb_2.12.1              RColorBrewer_1.1-2            RcppZiggurat_0.1.5            europepmc_0.4                
 [66] stringr_1.4.0                 lava_1.6.7                    sva_3.36.0                    ggsignif_0.6.0                recipes_0.1.12               
 [71] threejs_0.3.3                 labeling_0.3                  httpuv_1.5.3.1                class_7.3-17                  preprocessCore_1.50.0        
 [76] shinystan_2.5.0               DO.db_2.9                     annotate_1.66.0               bsseq_1.24.1                  jsonlite_1.6.1               
 [81] tmvnsim_1.0-2                 bit_1.1-15.2                  mime_0.9                      gplots_3.0.3                  gridExtra_2.3                
 [86] Rsamtools_2.4.0               parsetools_0.1.3              bigparallelr_0.2.3            stringi_1.4.6                 processx_3.4.2               
 [91] glinternet_1.0.10             quadprog_1.5-8                bitops_1.0-6                  cli_2.0.2                     sqldf_0.4-11                 
 [96] tidyr_1.1.0                   brms_2.13.0                   data.table_1.12.8             rsconnect_0.8.16              KEGGgraph_1.48.0             
[101] survminer_0.4.7               rstudioapi_0.11               GenomicAlignments_1.24.0      nlme_3.1-148                  qvalue_2.20.0                
[106] VariantAnnotation_1.34.0      ggthemes_4.2.0                miniUI_0.1.1.1                gridGraphics_0.5-0            survMisc_0.5.5               
[111] R.oo_1.23.0                   dbplyr_1.4.4                  readxl_1.3.1                  lifecycle_0.2.0               timeDate_3043.102            
[116] ExperimentHub_1.14.0          munsell_0.5.0                 cellranger_1.1.0              R.methodsS3_1.8.0             hwriter_1.3.2                
[121] caTools_1.18.0                codetools_0.2-16              coda_0.19-3                   bayesplot_1.7.2               triebeard_0.3.0              
[126] htmlTable_1.13.3              missMethyl_1.22.0             rstantools_2.1.0              proto_1.0.0                   xtable_1.8-4                 
[131] BiocManager_1.30.10           StanHeaders_2.21.0-3          abind_1.4-5                   farver_2.0.3                  km.ci_0.5-2                  
[136] AnnotationHub_2.20.0          askpass_1.1                   biovizBase_1.36.0             GEOquery_2.56.0               shinythemes_1.1.2            
[141] tibble_3.0.1                  dichromat_2.0-0               Matrix_1.2-18                 ellipsis_0.3.1                prettyunits_1.1.1            
[146] lubridate_1.7.8               ggridges_0.5.2                mclust_5.4.6                  multtest_2.44.0               fgsea_1.14.0                 
[151] shinyjs_1.1                   testthat_2.3.2                htmltools_0.4.0               BiocFileCache_1.12.0          hash_2.2.6.1                 
[156] yaml_2.2.1                    loo_2.2.0                     utf8_1.1.4                    interactiveDisplayBase_1.26.2 XML_3.99-0.3                 
[161] ModelMetrics_1.2.2.2          ggpubr_0.3.0                  foreign_0.8-80                withr_2.2.0                   BiocParallel_1.22.0          
[166] aroma.light_3.18.0            bit64_0.9-7                   rngtools_1.5                  plotmo_3.5.7                  doRNG_1.8.2                  
[171] affyio_1.58.0                 ProtGenerics_1.20.0           GOSemSim_2.14.0               memoise_1.1.0                 forcats_0.5.0                
[176] rio_0.5.16                    geneplotter_1.66.0            permute_0.9-5                 callr_3.4.3                   caret_6.0-86                 
[181] ps_1.3.3                      curl_4.3                      postlogic_0.1.0.1             fansi_0.4.1                   urltools_1.7.3               
[186] xts_0.12-0                    acepack_1.4.1                 edgeR_3.30.1                  checkmate_2.0.0               nFactors_2.4.1               
[191] Gviz_1.32.0                   TeachingDemos_2.12            rjson_0.2.20                  openxlsx_4.1.5                rstatix_0.5.0                
[196] clue_0.3-57                   tools_4.0.0                   RCurl_1.98-1.2                car_3.0-8                     ggplotify_0.0.5              
[201] xml2_1.3.2                    httr_1.4.1                    assertthat_0.2.1              R6_2.4.1                      Rhdf5lib_1.10.0              
[206] AnnotationFilter_1.12.0       nnet_7.3-14                   flock_0.7                     progress_1.2.2                genefilter_1.70.0            
[211] Brobdingnag_1.2-6             KEGGREST_1.28.0               shape_1.4.4                   statmod_1.4.34                BiocVersion_3.11.1           
[216] HDF5Array_1.16.0              rhdf5_2.32.0                  splines_4.0.0                 carData_3.0-4                 colorspace_1.4-1             
[221] generics_0.0.2                base64enc_0.1-3               chron_2.3-55                  pkgcond_0.1.0                 pillar_1.4.4                 
[226] tweenr_1.0.1                  Rgraphviz_2.32.0              affy_1.66.0                   rvcheck_0.1.8                 GenomeInfoDbData_1.2.3       
[231] plyr_1.8.6                    gtable_0.3.0                  rvest_0.3.5                   zip_2.0.4                     psych_1.9.12.31              
[236] colourpicker_1.0              knitr_1.28                    latticeExtra_0.6-29           biomaRt_2.44.0                fastmap_1.0.1                
[241] crosstalk_1.1.0.1             doParallel_1.0.15             testextra_0.1.0.1             Rfast_1.9.9                   broom_0.5.6                  
[246] openssl_1.4.1                 BSgenome_1.56.0               backports_1.1.7               plotrix_3.7-8                 ipred_0.9-9                  
[251] base64_2.0                    enrichplot_1.8.1              ggforce_0.3.1                 hms_0.5.3                     scrime_1.3.5                 
[256] DESeq_1.39.0                  shiny_1.4.0.2                 KMsurv_0.1-5                  polyclip_1.10-0               siggenes_1.62.0              
[261] purrrogress_0.1.1             lazyeval_0.2.2                Formula_1.2-3                 crayon_1.3.4                  MASS_7.3-51.6                
[266] downloader_0.4                pROC_1.16.2                   viridis_0.5.1                 reshape_0.8.8                 bridgesampling_1.0-0         
[271] rpart_4.1-15                  compiler_4.0.0
jacobbien commented 4 years ago

Thanks for your question. This was actually an experimental function (as mentioned in the documentation) and in fact it was not discussed in our paper. To avoid confusion, I've made it so the function is no longer exposed by the package to users.

However, I am guessing you are wanting to know which main effects and interactions have been selected. Here's an example showing how you can do this.

# generate some data:
set.seed(123)
x <- matrix(rnorm(100 * 10), ncol = 10)
x <- scale(x, TRUE, TRUE)
y <- x[, 1] + 2 * x[, 2] + x[, 1] * x[, 2] + 3 * rnorm(100)

# get hierNet fit:
fit <- hierNet(x, y, lam = 80)
GG converged in 47 iterations.

# see which main effects were selected:
which(fit$bp - fit$bn != 0)

# see which interactions were selected:
which(fit$th != 0, arr.ind = TRUE)

Note: In the above, I've just taken lambda to be a fixed value, but in practice you would use cross validation to select lambda. I hope this helps!

jamesdalg commented 4 years ago

I was more hoping for a ranking of variables or variable pairs in the model, but perhaps one could reshape2::melt the interaction coefficient matrix into a 3 column dataframe and sort it by the edge interaction coefficient value (and filtering out the zero weights, as you suggest above). Just a suggestion, but I think that would just take a single line, something like: fit$th[fit$th != 0] %>% reshape2::melt() %>% dplyr::arrange(-value)