Open lorenzoFabbri opened 2 years ago
I haven't been able to reproduce the error you're getting. Can you tell me a bit more about your inputs to ewaff.sites()?
e.g.
What does the call look like? sites.ret <- ewaff.sites(methylation ~ variable + continuous + categorical, variable.of.interest="variable", methylation=methylation, data=data, generate.confounders="sva", random.subset=1, n.confounders=1, method="glm")
What are the dimensions of the 'methylation' matrix and 'data'?
methylation.no.outliers <- ewaff::ewaff.handle.outliers(methylation = meth.time,
method = "iqr")[[1]]
colnames(methylation.no.outliers) <- paste0("s",
1:dim(methylation.no.outliers)[2])
cpg.sites <- rownames(methylation.no.outliers)
methylation.no.outliers <- as.data.frame(methylation.no.outliers)
rownames(methylation.no.outliers) <- cpg.sites
gc()
formula.tmp <- paste("methylation~", exposure, "+",
paste(colnames(metadata), collapse = "+"))
res.tmp <- ewaff::ewaff.sites(formula = formula.tmp,
variable.of.interest = exposure,
methylation = methylation.no.outliers,
data = x,
method = "rlm")
In my case methylation.no.outliers
has dimension 300,000 by 140 and x
has dimension 140 by 7.
I see that formula.tmp is defined using 'metadata' but 'x' is passed as 'data'. Are they the same? Could you send the output of print("formula.tmp"), print(x[1:3,]), print(exposure)?
[1] "methylation~ log.mep_e + e3_sex.x+cohort.x+season+age_sample_years.x+zBMI+hs_dift_mealblood_imp"
log.mep_e e3_sex.x cohort.x season age_sample_years.x zBMI hs_dift_mealblood_imp
1 0.8512623 / / spring / -0.3376235 2.166667
2 0.3855461 / / winter / 0.1117272 1.600000
3 -0.6155476 / / winter / -0.1173215 1.416667
[1] "log.mep_e"
Still not able to reproduce this error. Would it be possible for you to compile a small dataset that you could share with me that generates that error?
Sorry for the late reply, @perishky ...
N <- 150 # Number of subjects
M <- 1000 # Number of sites
rnd <- matrix(rnorm(N*M, mean=0, sd=1), N, M) # Random Methylation matrix
colnames(rnd) <- paste0("cg", 1:dim(rnd)[2])
meth.time <- as.matrix(rnd) %>% t()
colnames(meth.time) <- paste0("s", 1:dim(meth.time)[2])
rnd <- matrix( rnorm(N, mean=0, sd=1), N, 1) # Random vector for independent variable
colnames(rnd) <- "exp.2"
methylation.no.outliers <- ewaff::ewaff.handle.outliers(methylation = meth.time, method = "iqr")[[1]]
exposure <- "exp.2"
formula.tmp <- paste("methylation ~ ", exposure)
res.tmp <- ewaff::ewaff.sites(formula = formula.tmp,
variable.of.interest = exposure,
methylation = methylation.no.outliers,
data = rnd,
method = "rlm")
This leads to the following error message:
Error in
$<-.data.frame
(*tmp*
, p.adjust, value = numeric(0)) : replacement has 0 rows, data has 1000
If, on the other hand, I modify the function fit.loop
the following way
design[, "methylation"] <- t(methylation[i, ])
I get no error messages:
$table
estimate se z p.value n p.adjust
cg13869341 -2.263605e-02 0.08172225 -0.2769876551 0.781789585 150 1
cg24669183 -1.128351e-02 0.07458727 -0.1512793608 0.879755350 150 1
cg15560884 -2.026091e-02 0.08747098 -0.2316301075 0.816825321 150 1
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] igraph_1.2.11 sqldf_0.4-11 RSQLite_2.2.9 gsubfn_0.7 proto_1.0.0 data.table_1.14.2 gtsummary_1.5.2
[8] qgraph_1.9 Matrix_1.4-0 GeneNet_1.2.16 fdrtool_1.2.17 longitudinal_1.1.13 corpcor_1.6.10 tidygraph_1.2.0
[15] ggraph_2.0.5 psych_2.1.9 gridExtra_2.3 ggrepel_0.9.1 ggpubr_0.4.0 knitr_1.37 tidyselect_1.1.1
[22] imputeMissings_0.0.3 VIM_6.1.1 colorspace_2.0-2 labelled_2.9.0 robust_0.7-0 fit.models_0.64 fastDummies_1.6.3
[29] mixOmics_6.17.28 lattice_0.20-45 MASS_7.3-55 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[36] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] vcd_1.4-9 Hmisc_4.6-0 class_7.3-20 Rsamtools_2.8.0 lmtest_0.9-39
[6] laeken_0.5.2 smacof_2.1-3 foreach_1.5.1 glmnet_4.1-3 crayon_1.4.2
[11] rhdf5filters_1.4.0 nlme_3.1-155 backports_1.4.1 reprex_2.0.1 ellipse_0.4.2
[16] huge_1.3.5 rlang_1.0.0 XVector_0.34.0 readxl_1.3.1 nloptr_2.0.0
[21] limma_3.50.0 minfi_1.38.0 filelock_1.0.2 dplR_1.7.2 BiocParallel_1.28.3
[26] rjson_0.2.21 bit64_4.0.5 glue_1.6.1 rngtools_1.5.2 AnnotationDbi_1.56.2
[31] BiocGenerics_0.40.0 ewaff_0.0.2 base64url_1.4 NetworkToolbox_1.4.2 tcltk_4.1.2
[36] haven_2.4.3 SummarizedExperiment_1.22.0 XML_3.99-0.8 zoo_1.8-9 GenomicAlignments_1.28.0
[41] chron_2.3-56 nnls_1.4 xtable_1.8-4 magrittr_2.0.2 evaluate_0.14
[46] cli_3.1.1 zlibbioc_1.40.0 rstudioapi_0.13 doRNG_1.8.2 sp_1.4-6
[51] rpart_4.1.16 wordcloud_2.6 RJSONIO_1.3-1.6 polynom_1.4-0 xfun_0.29
[56] askpass_1.1 weights_1.0.4 multtest_2.48.0 cluster_2.1.2 pbdZMQ_0.3-6
[61] KEGGREST_1.34.0 base64_2.0 scrime_1.3.5 IsingSampler_0.2.1 Biostrings_2.62.0
[66] png_0.1-7 reshape_0.8.8 withr_2.4.3 bitops_1.0-7 ggforce_0.3.3
[71] ranger_0.13.1 plyr_1.8.6 cellranger_1.1.0 pcaPP_1.9-74 e1071_1.7-9
[76] pillar_1.7.0 bumphunter_1.34.0 cachem_1.0.6 GenomicFeatures_1.44.2 broom.helpers_1.6.0
[81] fs_1.5.2 DelayedMatrixStats_1.14.3 vctrs_0.3.8 pbivnorm_0.6.0 ellipsis_0.3.2
[86] generics_0.1.2 tools_4.1.2 foreign_0.8-82 munsell_0.5.0 tweenr_1.0.2
[91] proxy_0.4-26 DelayedArray_0.18.0 fastmap_1.1.0 compiler_4.1.2 abind_1.4-5
[96] rtracklayer_1.52.1 beanplot_1.2 gt_0.3.1 GenomeInfoDbData_1.2.7 snow_0.4-4
[101] utf8_1.2.2 BiocFileCache_2.0.0 jsonlite_1.7.3 scales_1.1.1 graph_1.70.0
[106] pbapply_1.5-0 carData_3.0-5 sparseMatrixStats_1.4.2 genefilter_1.76.0 car_3.0-12
[111] doParallel_1.0.16 latticeExtra_0.6-29 R.utils_2.11.0 checkmate_2.0.0 nor1mix_1.3-0
[116] rARPACK_0.11-0 siggenes_1.66.0 uchardet_1.1.0 Biobase_2.54.0 HDF5Array_1.20.0
[121] survival_3.2-13 yaml_2.2.2 plotrix_3.8-2 htmltools_0.5.2 memoise_2.0.1
[126] networktools_1.4.0 lavaan_0.6-10 BiocIO_1.2.0 candisc_0.8-6 IsingFit_0.3.1
[131] locfit_1.5-9.4 graphlayouts_0.8.0 IRanges_2.28.0 quadprog_1.5-8 viridisLite_0.4.0
[136] digest_0.6.29 rrcov_1.6-1 assertthat_0.2.1 rappdirs_0.3.3 repr_1.1.4
[141] bayestestR_0.11.5 mgm_1.2-12 blob_1.2.2 S4Vectors_0.32.3 R.oo_1.24.0
[146] RCy3_2.12.4 preprocessCore_1.54.0 splines_4.1.2 Formula_1.2-4 labeling_0.4.2
[151] Rhdf5lib_1.14.2 illuminaio_0.34.0 RCurl_1.98-1.5 broom_0.7.12 hms_1.1.1
[156] modelr_0.1.8 rhdf5_2.36.0 base64enc_0.1-3 mnormt_2.0.2 GenomicRanges_1.46.1
[161] shape_1.4.6 tmvnsim_1.0-2 bootnet_1.5 nnet_7.3-17 GEOquery_2.60.0
[166] Rcpp_1.0.8 mclust_5.4.9 mvtnorm_1.1-3 fansi_1.0.2 tzdb_0.2.0
[171] IRdisplay_1.1 R6_2.5.1 lifecycle_1.0.1 signal_0.7-7 datawizard_0.2.3
[176] curl_4.3.2 ggsignif_0.6.3 minqa_1.2.4 gdata_2.18.0 robustbase_0.93-9
[181] glasso_1.11 RColorBrewer_1.1-2 iterators_1.0.13 htmlwidgets_1.5.4 polyclip_1.10-0
[186] biomaRt_2.48.3 rvest_1.0.2 openssl_1.4.6 insight_0.15.0 htmlTable_2.4.0
[191] codetools_0.2-18 matrixStats_0.61.0 lubridate_1.8.0 randomForest_4.6-14 gtools_3.9.2
[196] prettyunits_1.1.1 dbplyr_2.1.1 RSpectra_0.16-0 R.methodsS3_1.8.1 GenomeInfoDb_1.30.1
[201] correlation_0.7.1 gtable_0.3.0 DBI_1.1.2 stats4_4.1.2 httr_1.4.2
[206] eigenmodel_1.11 stringi_1.7.6 vroom_1.5.7 progress_1.2.2 uuid_1.0-3
[211] reshape2_1.4.4 farver_2.1.0 annotate_1.72.0 viridis_0.6.2 mice_3.14.0
[216] xml2_1.3.3 IRkernel_1.3 boot_1.3-28 heplots_1.3-9 lme4_1.1-27.1
[221] restfulr_0.0.13 DEoptimR_1.0-10 bit_4.0.4 jpeg_0.1-9 MatrixGenerics_1.4.3
Up @perishky :)
I tried to reproduce the example in the tutorial but was not able to since I cannot create the
methylation
matrix. Using my data I keep encountering the following message error:I was able to track it down to the following line: https://github.com/perishky/ewaff/blob/80d181416d5eeb2fabb1b0daf522598d06f1aa3e/R/fit-loop.r#L3.
Why is the design matrix supposed to contain also information on the dependent variable? In your tutorial
data
indeed does not contain it. Thank you.EDIT: I used
trace
to really understand the code and apparently there is a problem when doingdesign[, "methylation"] <- methylation[i, ]
. Is it possible that it should instead bedesign[, "methylation"] <- t(methylation[i, ])
?