perishky / ewaff

Artistic License 2.0
3 stars 0 forks source link

Design matrix is supposed to contain also the dependent variable? #3

Open lorenzoFabbri opened 2 years ago

lorenzoFabbri commented 2 years ago

I tried to reproduce the example in the tutorial but was not able to since I cannot create the methylation matrix. Using my data I keep encountering the following message error:

Error in design[, "methylation"] : incorrect number of dimensions

I was able to track it down to the following line: https://github.com/perishky/ewaff/blob/80d181416d5eeb2fabb1b0daf522598d06f1aa3e/R/fit-loop.r#L3.

Why is the design matrix supposed to contain also information on the dependent variable? In your tutorial data indeed does not contain it. Thank you.

EDIT: I used trace to really understand the code and apparently there is a problem when doing design[, "methylation"] <- methylation[i, ]. Is it possible that it should instead be design[, "methylation"] <- t(methylation[i, ])?

perishky commented 2 years ago

I haven't been able to reproduce the error you're getting. Can you tell me a bit more about your inputs to ewaff.sites()?
e.g.

lorenzoFabbri commented 2 years ago
methylation.no.outliers <- ewaff::ewaff.handle.outliers(methylation = meth.time, 
                                                              method = "iqr")[[1]]
colnames(methylation.no.outliers) <- paste0("s", 
                                            1:dim(methylation.no.outliers)[2])
cpg.sites <- rownames(methylation.no.outliers)
methylation.no.outliers <- as.data.frame(methylation.no.outliers)
rownames(methylation.no.outliers) <- cpg.sites
gc()

formula.tmp <- paste("methylation~", exposure, "+", 
                     paste(colnames(metadata), collapse = "+"))
res.tmp <- ewaff::ewaff.sites(formula = formula.tmp, 
                              variable.of.interest = exposure, 
                              methylation = methylation.no.outliers, 
                              data = x, 
                              method = "rlm")

In my case methylation.no.outliers has dimension 300,000 by 140 and x has dimension 140 by 7.

perishky commented 2 years ago

I see that formula.tmp is defined using 'metadata' but 'x' is passed as 'data'. Are they the same? Could you send the output of print("formula.tmp"), print(x[1:3,]), print(exposure)?

lorenzoFabbri commented 2 years ago

[1] "methylation~ log.mep_e + e3_sex.x+cohort.x+season+age_sample_years.x+zBMI+hs_dift_mealblood_imp"

   log.mep_e e3_sex.x cohort.x season age_sample_years.x       zBMI hs_dift_mealblood_imp
1  0.8512623   /      / spring           / -0.3376235              2.166667
2  0.3855461   /      / winter           /  0.1117272              1.600000
3 -0.6155476     /      / winter           / -0.1173215              1.416667

[1] "log.mep_e"

perishky commented 2 years ago

Still not able to reproduce this error. Would it be possible for you to compile a small dataset that you could share with me that generates that error?

lorenzoFabbri commented 2 years ago

Sorry for the late reply, @perishky ...

N <- 150 # Number of subjects
M <- 1000 # Number of sites
rnd <- matrix(rnorm(N*M, mean=0, sd=1), N, M) # Random Methylation matrix
colnames(rnd) <- paste0("cg", 1:dim(rnd)[2])
meth.time <- as.matrix(rnd) %>% t()
colnames(meth.time) <- paste0("s", 1:dim(meth.time)[2])

rnd <- matrix( rnorm(N, mean=0, sd=1), N, 1) # Random vector for independent variable
colnames(rnd) <- "exp.2"

methylation.no.outliers <- ewaff::ewaff.handle.outliers(methylation = meth.time,  method = "iqr")[[1]]

exposure <- "exp.2"
formula.tmp <- paste("methylation ~ ", exposure)

res.tmp <- ewaff::ewaff.sites(formula = formula.tmp, 
                               variable.of.interest = exposure, 
                               methylation = methylation.no.outliers, 
                               data = rnd, 
                               method = "rlm")

This leads to the following error message:

Error in $<-.data.frame(*tmp*, p.adjust, value = numeric(0)) : replacement has 0 rows, data has 1000

If, on the other hand, I modify the function fit.loop the following way design[, "methylation"] <- t(methylation[i, ]) I get no error messages:

$table
                estimate         se             z     p.value   n p.adjust
cg13869341 -2.263605e-02 0.08172225 -0.2769876551 0.781789585 150        1
cg24669183 -1.128351e-02 0.07458727 -0.1512793608 0.879755350 150        1
cg15560884 -2.026091e-02 0.08747098 -0.2316301075 0.816825321 150        1
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] igraph_1.2.11        sqldf_0.4-11         RSQLite_2.2.9        gsubfn_0.7           proto_1.0.0          data.table_1.14.2    gtsummary_1.5.2     
 [8] qgraph_1.9           Matrix_1.4-0         GeneNet_1.2.16       fdrtool_1.2.17       longitudinal_1.1.13  corpcor_1.6.10       tidygraph_1.2.0     
[15] ggraph_2.0.5         psych_2.1.9          gridExtra_2.3        ggrepel_0.9.1        ggpubr_0.4.0         knitr_1.37           tidyselect_1.1.1    
[22] imputeMissings_0.0.3 VIM_6.1.1            colorspace_2.0-2     labelled_2.9.0       robust_0.7-0         fit.models_0.64      fastDummies_1.6.3   
[29] mixOmics_6.17.28     lattice_0.20-45      MASS_7.3-55          forcats_0.5.1        stringr_1.4.0        dplyr_1.0.7          purrr_0.3.4         
[36] readr_2.1.2          tidyr_1.2.0          tibble_3.1.6         ggplot2_3.3.5        tidyverse_1.3.1     

loaded via a namespace (and not attached):
  [1] vcd_1.4-9                   Hmisc_4.6-0                 class_7.3-20                Rsamtools_2.8.0             lmtest_0.9-39              
  [6] laeken_0.5.2                smacof_2.1-3                foreach_1.5.1               glmnet_4.1-3                crayon_1.4.2               
 [11] rhdf5filters_1.4.0          nlme_3.1-155                backports_1.4.1             reprex_2.0.1                ellipse_0.4.2              
 [16] huge_1.3.5                  rlang_1.0.0                 XVector_0.34.0              readxl_1.3.1                nloptr_2.0.0               
 [21] limma_3.50.0                minfi_1.38.0                filelock_1.0.2              dplR_1.7.2                  BiocParallel_1.28.3        
 [26] rjson_0.2.21                bit64_4.0.5                 glue_1.6.1                  rngtools_1.5.2              AnnotationDbi_1.56.2       
 [31] BiocGenerics_0.40.0         ewaff_0.0.2                 base64url_1.4               NetworkToolbox_1.4.2        tcltk_4.1.2                
 [36] haven_2.4.3                 SummarizedExperiment_1.22.0 XML_3.99-0.8                zoo_1.8-9                   GenomicAlignments_1.28.0   
 [41] chron_2.3-56                nnls_1.4                    xtable_1.8-4                magrittr_2.0.2              evaluate_0.14              
 [46] cli_3.1.1                   zlibbioc_1.40.0             rstudioapi_0.13             doRNG_1.8.2                 sp_1.4-6                   
 [51] rpart_4.1.16                wordcloud_2.6               RJSONIO_1.3-1.6             polynom_1.4-0               xfun_0.29                  
 [56] askpass_1.1                 weights_1.0.4               multtest_2.48.0             cluster_2.1.2               pbdZMQ_0.3-6               
 [61] KEGGREST_1.34.0             base64_2.0                  scrime_1.3.5                IsingSampler_0.2.1          Biostrings_2.62.0          
 [66] png_0.1-7                   reshape_0.8.8               withr_2.4.3                 bitops_1.0-7                ggforce_0.3.3              
 [71] ranger_0.13.1               plyr_1.8.6                  cellranger_1.1.0            pcaPP_1.9-74                e1071_1.7-9                
 [76] pillar_1.7.0                bumphunter_1.34.0           cachem_1.0.6                GenomicFeatures_1.44.2      broom.helpers_1.6.0        
 [81] fs_1.5.2                    DelayedMatrixStats_1.14.3   vctrs_0.3.8                 pbivnorm_0.6.0              ellipsis_0.3.2             
 [86] generics_0.1.2              tools_4.1.2                 foreign_0.8-82              munsell_0.5.0               tweenr_1.0.2               
 [91] proxy_0.4-26                DelayedArray_0.18.0         fastmap_1.1.0               compiler_4.1.2              abind_1.4-5                
 [96] rtracklayer_1.52.1          beanplot_1.2                gt_0.3.1                    GenomeInfoDbData_1.2.7      snow_0.4-4                 
[101] utf8_1.2.2                  BiocFileCache_2.0.0         jsonlite_1.7.3              scales_1.1.1                graph_1.70.0               
[106] pbapply_1.5-0               carData_3.0-5               sparseMatrixStats_1.4.2     genefilter_1.76.0           car_3.0-12                 
[111] doParallel_1.0.16           latticeExtra_0.6-29         R.utils_2.11.0              checkmate_2.0.0             nor1mix_1.3-0              
[116] rARPACK_0.11-0              siggenes_1.66.0             uchardet_1.1.0              Biobase_2.54.0              HDF5Array_1.20.0           
[121] survival_3.2-13             yaml_2.2.2                  plotrix_3.8-2               htmltools_0.5.2             memoise_2.0.1              
[126] networktools_1.4.0          lavaan_0.6-10               BiocIO_1.2.0                candisc_0.8-6               IsingFit_0.3.1             
[131] locfit_1.5-9.4              graphlayouts_0.8.0          IRanges_2.28.0              quadprog_1.5-8              viridisLite_0.4.0          
[136] digest_0.6.29               rrcov_1.6-1                 assertthat_0.2.1            rappdirs_0.3.3              repr_1.1.4                 
[141] bayestestR_0.11.5           mgm_1.2-12                  blob_1.2.2                  S4Vectors_0.32.3            R.oo_1.24.0                
[146] RCy3_2.12.4                 preprocessCore_1.54.0       splines_4.1.2               Formula_1.2-4               labeling_0.4.2             
[151] Rhdf5lib_1.14.2             illuminaio_0.34.0           RCurl_1.98-1.5              broom_0.7.12                hms_1.1.1                  
[156] modelr_0.1.8                rhdf5_2.36.0                base64enc_0.1-3             mnormt_2.0.2                GenomicRanges_1.46.1       
[161] shape_1.4.6                 tmvnsim_1.0-2               bootnet_1.5                 nnet_7.3-17                 GEOquery_2.60.0            
[166] Rcpp_1.0.8                  mclust_5.4.9                mvtnorm_1.1-3               fansi_1.0.2                 tzdb_0.2.0                 
[171] IRdisplay_1.1               R6_2.5.1                    lifecycle_1.0.1             signal_0.7-7                datawizard_0.2.3           
[176] curl_4.3.2                  ggsignif_0.6.3              minqa_1.2.4                 gdata_2.18.0                robustbase_0.93-9          
[181] glasso_1.11                 RColorBrewer_1.1-2          iterators_1.0.13            htmlwidgets_1.5.4           polyclip_1.10-0            
[186] biomaRt_2.48.3              rvest_1.0.2                 openssl_1.4.6               insight_0.15.0              htmlTable_2.4.0            
[191] codetools_0.2-18            matrixStats_0.61.0          lubridate_1.8.0             randomForest_4.6-14         gtools_3.9.2               
[196] prettyunits_1.1.1           dbplyr_2.1.1                RSpectra_0.16-0             R.methodsS3_1.8.1           GenomeInfoDb_1.30.1        
[201] correlation_0.7.1           gtable_0.3.0                DBI_1.1.2                   stats4_4.1.2                httr_1.4.2                 
[206] eigenmodel_1.11             stringi_1.7.6               vroom_1.5.7                 progress_1.2.2              uuid_1.0-3                 
[211] reshape2_1.4.4              farver_2.1.0                annotate_1.72.0             viridis_0.6.2               mice_3.14.0                
[216] xml2_1.3.3                  IRkernel_1.3                boot_1.3-28                 heplots_1.3-9               lme4_1.1-27.1              
[221] restfulr_0.0.13             DEoptimR_1.0-10             bit_4.0.4                   jpeg_0.1-9                  MatrixGenerics_1.4.3
lorenzoFabbri commented 2 years ago

Up @perishky :)