waldronlab / CNVRanger

Functionality for CNV analysis
https://bioconductor.org/packages/CNVRanger
7 stars 7 forks source link

cnvEQTL error in .getStates(), no CNV region satisfying min.samples threshold #43

Open renatoliguori88 opened 10 months ago

renatoliguori88 commented 10 months ago

Hi everyone, I'm pretty new to CNV ranger, however I was able to follow the vignette for most of the steps until the chapter 7 (CNV-expression association analysis).

I tried many times with my data but was always facing the same error:

Restricting analysis to 24 intersecting samples
Preprocessing RNA-seq data ...
Summarizing per-sample CN state in each CNV region
Excluding 4881 cnvrs not satisfying min.samples threshold
Error in .getStates(cnvrs, calls, multi.calls, min.samples, verbose = verbose) : 
  No CNV region satisfying min.samples threshold

I also tried to create a subset for testing since my dataset is pretty huge, restricting the analysis to chromosome 1 and 2 with the command:

sel.cnvrs <- subset(cnvrs, seqnames %in% paste0("chr", 1:2))

GRanges object with 4881 ranges and 3 metadata columns:
         seqnames              ranges strand |      freq        type    pvalue
            <Rle>           <IRanges>  <Rle> | <numeric> <character> <numeric>
     [1]     chr1     3073253-3074322      * |         3        both  0.911548
     [2]     chr1     3102016-3102125      * |         3        both  0.911548
     [3]     chr1     3205901-3671498      * |         3        both  0.753688
     [4]     chr1     3680155-3681788      * |         3        both  0.911548
     [5]     chr1     3752010-3754360      * |         3        both  0.911548
     ...      ...                 ...    ... .       ...         ...       ...
  [4877]     chr2 181724042-181724942      * |         5        gain  0.265666
  [4878]     chr2 181763332-181827797      * |         5        gain  0.265666
  [4879]     chr2 181837854-181857461      * |         5        gain  0.265666
  [4880]     chr2 181864337-181870830      * |         5        gain  0.265666
  [4881]     chr2 181896823-181898712      * |         5        gain  0.265666
  -------
  seqinfo: 21 sequences from an unspecified genome; no seqlengths

and also reducing the number of min.samples when running the command:

res <- cnvEQTL(sel.cnvrs, grl, rse, window = "1Mbp", verbose = TRUE, min.samples = 2)

Just for better understanding i will print here my objects that i put inside the cnvEQTL function:

grl


GRangesList object of length 24:
$KPC438
GRanges object with 55235 ranges and 5 metadata columns:
seqnames            ranges strand |      Mean          Gene             GeneID     state   Phenotype
<Rle>         <IRanges>  <Rle> | <numeric>   <character>        <character> <numeric> <character>
[1]     chr1   3073253-3074322      * |   -0.1428 4933401J01Rik ENSMUSG00000102693         2  epithelial
[2]     chr1   3102016-3102125      * |   -0.1428       Gm26206 ENSMUSG00000064842         2  epithelial
[3]     chr1   3205901-3671498      * |   -0.1428          Xkr4 ENSMUSG00000051951         2  epithelial
[4]     chr1   3252757-3253236      * |   -0.1428       Gm18956 ENSMUSG00000102851         2  epithelial
[5]     chr1   3365731-3368549      * |   -0.1428       Gm37180 ENSMUSG00000103377         2  epithelial
...      ...               ...    ... .       ...           ...                ...       ...         ...
[55231]     chrY 90603501-90605864      * |   -1.4738       Gm28300 ENSMUSG00000099619         1  epithelial
[55232]     chrY 90665346-90667625      * |   -1.4738       Gm28301 ENSMUSG00000099399         1  epithelial
[55233]     chrY 90752427-90755467      * |   -1.4738       Gm21860 ENSMUSG00000095366         1  epithelial
[55234]     chrY 90753057-90763485      * |   -1.4738      Mid1-ps1 ENSMUSG00000095134         1  epithelial
[55235]     chrY 90784738-90816465      * |   -1.4738       Gm47283 ENSMUSG00000096768         1  epithelial
-------
seqinfo: 21 sequences from an unspecified genome; no seqlengths
<23 more elements> ``` > rse ``` class: RangedSummarizedExperiment dim: 2963 24 metadata(0): assays(1): rcounts rownames(2963): ENSMUSG00000051951 ENSMUSG00000025900 ... ENSMUSG00000038628 ENSMUSG00000098505 rowData names(0): colnames(24): KPC438 KPC479B ... KPCZ532 KPCZ746 colData names(0): ``` However the result never changed and I'm always getting the same error. Before sharing any data i would like to know if any of you ever got the same error as me Thank you so much in advance Sessioninfo: ``` R version 4.3.1 (2023-06-16) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS Matrix products: default BLAS/LAPACK: /data/research/restools/miniconda/envs/R.4.3.1/lib/libopenblasp-r0.3.24.so; LAPACK version 3.11.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: Etc/UTC tzcode source: system (glibc) attached base packages: [1] grid stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] ComplexHeatmap_2.16.0 EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.24.1 [4] AnnotationFilter_1.24.0 GenomicFeatures_1.52.2 AnnotationDbi_1.62.2 [7] Gviz_1.44.2 SummarizedExperiment_1.30.2 Biobase_2.60.0 [10] MatrixGenerics_1.12.3 matrixStats_1.0.0 BSgenome.Mmusculus.UCSC.mm10_1.4.3 [13] BSgenome_1.68.0 rtracklayer_1.60.1 Biostrings_2.68.1 [16] XVector_0.40.0 regioneR_1.32.0 AnnotationHub_3.8.0 [19] BiocFileCache_2.8.0 dbplyr_2.3.4 tibble_3.2.1 [22] CNVRanger_1.16.5 RaggedExperiment_1.24.2 GenomicRanges_1.52.1 [25] GenomeInfoDb_1.36.4 IRanges_2.34.1 S4Vectors_0.38.2 [28] BiocGenerics_0.46.0 loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 shape_1.4.6 rstudioapi_0.15.0 magrittr_2.0.3 [5] rmarkdown_2.25 GlobalOptions_0.1.2 BiocIO_1.10.0 zlibbioc_1.46.0 [9] vctrs_0.6.4 memoise_2.0.1 Rsamtools_2.16.0 RCurl_1.98-1.12 [13] base64enc_0.1-3 htmltools_0.5.6.1 S4Arrays_1.0.6 progress_1.2.2 [17] curl_5.1.0 Formula_1.2-5 htmlwidgets_1.6.2 cachem_1.0.8 [21] GenomicAlignments_1.36.0 iterators_1.0.14 mime_0.12 lifecycle_1.0.3 [25] pkgconfig_2.0.3 Matrix_1.6-1.1 R6_2.5.1 fastmap_1.1.1 [29] clue_0.3-65 GenomeInfoDbData_1.2.10 shiny_1.7.5.1 digest_0.6.33 [33] colorspace_2.1-0 Hmisc_5.1-1 RSQLite_2.3.1 filelock_1.0.2 [37] fansi_1.0.5 httr_1.4.7 abind_1.4-5 compiler_4.3.1 [41] doParallel_1.0.17 bit64_4.0.5 htmlTable_2.4.1 backports_1.4.1 [45] BiocParallel_1.34.2 DBI_1.1.3 biomaRt_2.56.1 rappdirs_0.3.3 [49] DelayedArray_0.26.7 rjson_0.2.21 tools_4.3.1 foreign_0.8-85 [53] interactiveDisplayBase_1.38.0 httpuv_1.6.12 nnet_7.3-19 glue_1.6.2 [57] restfulr_0.0.15 promises_1.2.1 checkmate_2.2.0 cluster_2.1.4 [61] generics_0.1.3 gtable_0.3.4 data.table_1.14.8 hms_1.1.3 [65] xml2_1.3.5 utf8_1.2.4 foreach_1.5.2 BiocVersion_3.17.1 [69] pillar_1.9.0 stringr_1.5.0 limma_3.56.2 later_1.3.1 [73] circlize_0.4.15 dplyr_1.1.3 lattice_0.22-5 deldir_1.0-9 [77] bit_4.0.5 biovizBase_1.48.0 tidyselect_1.2.0 locfit_1.5-9.8 [81] knitr_1.44 gridExtra_2.3 ProtGenerics_1.32.0 edgeR_3.42.4 [85] xfun_0.40 stringi_1.7.12 lazyeval_0.2.2 yaml_2.3.7 [89] evaluate_0.22 codetools_0.2-19 interp_1.1-4 BiocManager_1.30.22 [93] cli_3.6.1 rpart_4.1.21 xtable_1.8-4 munsell_0.5.0 [97] dichromat_2.0-0.1 Rcpp_1.0.11 png_0.1-8 XML_3.99-0.14 [101] parallel_4.3.1 ellipsis_0.3.2 ggplot2_3.4.4 blob_1.2.4 [105] prettyunits_1.2.0 jpeg_0.1-10 latticeExtra_0.6-30 bitops_1.0-7 [109] VariantAnnotation_1.46.0 scales_1.2.1 crayon_1.5.2 GetoptLong_1.0.5 [113] rlang_1.1.1 KEGGREST_1.40.1 ```
lgeistlinger commented 10 months ago

It seems that none of your CNV regions harbors:

1) calls from samples representing more than one CN state, and 2) calls from at least min.samples samples that deviate from the 2n diploid state

You might want to check the output of the following commands:

library(CNVRanger)
multi.calls <- CNVRanger:::.largest
calls <- as(grl, "RaggedExperiment")
cnv.states <- RaggedExperiment::qreduceAssay(calls, query = sel.cnvrs, simplifyReduce = multi.calls, background = 2)
tab <- apply(cnv.states, 1, table)

tab then lists for each CNV region how many samples are present in which CN state.

You then want to check:

isTestable <- function(states)
{   
        cond1 <- length(states) > 1 
        cond2 <- any(states[names(states) != "2"] >= min.samples)
        cond1 && cond2
}   
ind <- vapply(tab, isTestable, logical(1))
table(ind)

and the result of table(ind) will presumably show that none of the regions satisfies the conditions outlined above.

renatoliguori88 commented 10 months ago

@lgeistlinger Thank you so much for your answer. I did perform the test as you suggested and as pre-announced I got as results of table(ind)

ind
FALSE 
37456 

This leaves me a bit wondering about the data I received. I'm analysing some Mouse Pancreatic cell line (some are Ctrl and some are KO for a specific gene) Let's say in a dataset of 24 samples i have 12 Ctrl and 12 KO, how could be possible that none of the CNV region harbors (at least between the samples in one of the two groups). Since I'm really new on CNV analysis (I mostly work with scRNA-seq) I would appreciate some suggestions. Would make sense to analyse the two groups separately? all the Ctrl in one run and all the KO in another one ? Is there any other way to investigate CNV that explain variation in gene expression levels? At the end one of the main question we would like to address is that if the DE genes, obtained from the RNA-seq data, might be influenced from the CNV/CNA state of the samples.

Thanks again

Best,

Renato