vahuynh / dynGENIE3

Semi-parametric approach for the inference of gene regulatory networks from time series of expression data
21 stars 17 forks source link

NAs introduced during link.list production #7

Closed iamcorrinne closed 11 months ago

iamcorrinne commented 1 year ago

I have a 53306 x 14 matrix (genes x timepoints) that I am analyzing with dynGENIE3 (3638 of those genes are considered regulators). The dynGENIE3 function finishes to completion, but there is an error in getting the link.list.

> link.list <- get.link.list(res$weight.matrix, threshold=0.005)
Error in if (n > 0) c(NA_integer_, -n) else integer() :
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rep.fac * nx : NAs produced by integer overflow
2: In .set_row_names(as.integer(prod(d))) :
  NAs introduced by coercion to integer range

This error persists even if make the threshold more stringent and if I modify the report.max. Can you please advise?

Thanks, Corrinne

This is what I have loaded

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 9.0 (Plow)

Matrix products: default
BLAS/LAPACK: /opt/rit/el9/20230413/app/linux-rhel9-x86_64_v3/gcc-11.2.1/openblas-0.3.21-zniqbxhjyx3vl653otz7fkmwqvp7pzds/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] lubridate_1.9.2   forcats_1.0.0     stringr_1.5.0     dplyr_1.1.1
 [5] purrr_1.0.1       readr_2.1.4       tidyr_1.3.0       tibble_3.2.1
 [9] ggplot2_3.4.1     tidyverse_2.0.0   doRNG_1.8.6       rngtools_1.5.2
[13] doParallel_1.0.17 iterators_1.0.14  foreach_1.5.2     reshape2_1.4.4

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10      compiler_4.2.2   pillar_1.9.0     plyr_1.8.8
 [5] tools_4.2.2      digest_0.6.31    timechange_0.2.0 lifecycle_1.0.3
 [9] gtable_0.3.3     pkgconfig_2.0.3  rlang_1.1.0      cli_3.6.1
[13] withr_2.5.0      hms_1.1.3        generics_0.1.3   vctrs_0.6.2
[17] grid_4.2.2       tidyselect_1.2.0 glue_1.6.2       R6_2.5.1
[21] fansi_1.0.4      tzdb_0.3.0       magrittr_2.0.3   scales_1.2.1
[25] codetools_0.2-19 colorspace_2.1-0 utf8_1.2.3       stringi_1.7.12
[29] munsell_0.5.0
vahuynh commented 1 year ago

Hi,

Did you inspect the values in res$weight.matrix to check if there was any NaN or weird values?

iamcorrinne commented 1 year ago

Yes, and I don't see anything immediately weird.

> sum(is.na(res$weight.matrix))
[1] 0
> sum(is.infinite(res$weight.matrix))
[1] 0
> sum(is.finite(res$weight.matrix))
[1] 2841529636
> length(res$weight.matrix)
[1] 2841529636
> link.list <- get.link.list(res$weight.matrix, threshold=0.005)
Error in if (n > 0) c(NA_integer_, -n) else integer() :
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rep.fac * nx : NAs produced by integer overflow
2: In .set_row_names(as.integer(prod(d))) :
  NAs introduced by coercion to integer range
vahuynh commented 1 year ago

Hi,

As there is an integer overflow, I suspect the problem comes from the fact that the weight matrix is too big.

Could you first reduce your weight matrix to rows corresponding to the 3638 candidate regulators (as the other rows contain only zero values, they don't matter): res$weight.matrix <- res$weight.matrix[input.genes,] where 'input.genes' is a list containing the names of the regulators.

Then try to call get.link.list with this reduced matrix to see if the problem still occurs. I modified the get.link.list function so that it can handle non-square matrices.

iamcorrinne commented 12 months ago

That worked! Thanks so much.