Closed sara-sajjadi closed 12 months ago
On the face of it, I'm not sure what's happening, but I wonder if there is a region where there are too few CpGs and thus the 'interpolation' error. Could you try re-running on just a subset of the CpGs (e.g., just those on a single chromosome, perhaps a small chromosome like chr22 if your data are human) and reporting back.
Thanks @PeteHaitch ! It works alright with only chr22 data. I tried running the BSmooth.tstat with local.correct = FALSE on the whole dataset, it worked and here is what I found:
> summary(is.na(BS.tstat@stats))
rawSds tstat.sd group2.means group1.means tstat
Mode :logical Mode :logical Mode :logical Mode :logical Mode :logical
FALSE:24297269 FALSE:24297269 FALSE:24297763 FALSE:24297783 FALSE:24297269
TRUE :1525 TRUE :1525 TRUE :1031 TRUE :1011 TRUE :1525
Is there a way to fix this without having to subset the original file?
How many CpGs per chromosome do you have?
Something like table(seqnames(bs.fit))
will tell you.
I'm guessing there's a chromosome with very few CpGs and this is causing issues when the smoothing is being run during local.correct
(although I'm not super familiar with that code)
Looking at table(seqnames(BS.tstat@gr), is.na(BS.tstat@stats[, "rawSds"]))
etc. may also be informative to figure out where the NA
values are occurring.
Yes, that's right, chromosomes with very few CpGs have NAs! Thanks a lot for the help.
This issue keeps popping up. We should implement a pre-check on the number of CpGs per cluster (in this case: sequence contig or chromosome).
Best, Kasper
On Tue, May 30, 2023 at 5:05 AM Sara @.***> wrote:
Yes, that's right, chromosomes with very few CpGs have NAs! Thanks a lot for the help.
— Reply to this email directly, view it on GitHub https://github.com/hansenlab/bsseq/issues/120#issuecomment-1568059865, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF2DHZZEDGSI32QPUOS4R3XIWZ3ZANCNFSM6AAAAAAYQ74MWY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Best, Kasper
sorry to bother you all @kasperdanielhansen @sara-sajjadi @PeteHaitch
Que - but how did you remove non-NA values
or how did you solve it?
i tried filtering out regions with less than 2x coverage in each group but that didn't work
could you please provide some insights? thanks so much in advance!!!
############
#filter out regions where 2 or more samples have less than 2x coverage
############
BS.cov <- getCoverage(bismark_bsseq)
keepLoci.ex <- which(rowSums(BS.cov[, bismark_bsseq.fit$Type == "Tumor"] >= 2) >= 2 &
rowSums(BS.cov[, bismark_bsseq.fit$Type == "Normal"] >= 2) >= 2)
length(keepLoci.ex)
bismark_bsseq.fit <- bismark_bsseq.fit[keepLoci.ex,]
> bismark_bsseq.fit
An object of type 'BSseq' with
51670328 methylation loci
4 samples
has been smoothed with
BSmooth (ns = 70, h = 1000, maxGap = 100000000)
Some assays are HDF5Array-backed
##############
#do tstats
###############
> bismark_bsseq.tstat <- BSmooth.tstat(bismark_bsseq.fit,
+ group1 = c("SA123T_1","SA123T_2"),
+ group2 = c("SA123N_1","SA123N_2"),
+ estimate.var = "group2",
+ local.correct = TRUE,
+ verbose = TRUE)
[BSmooth.tstat] preprocessing ... done in 27.2 sec
[BSmooth.tstat] computing stats within groups ... done in 23.4 sec
[BSmooth.tstat] computing stats across groups ... Error in h(simpleError(msg, call)) :
error in evaluating the argument 'args' in selecting a method for function 'do.call': need at least two non-NA values to interpolate
############
####check counts of chromosome level methylation, looks ok
############
> table(seqnames(bismark_bsseq.fit))
1 2 3 4 5 6 7 8 9 10
4249685 4046504 3068293 2720217 2734683 2726634 2868261 2442494 2175831 2519427
11 12 13 14 15 16 17 18 19 20
2395826 2422185 1451962 1592770 1539470 2032957 2150986 1258984 1901428 1375109
21 22 X Y MT
703589 1050461 2235845 5854 873
So this is caused by the local.correct=TRUE setting in your call to BSmooth.tstat, which fits a very large bandwidth smoother. Whether or not you need this, depends on your application area, but I would not be surprised to this fail on the MT chromosome. Just to show that it works on the regular chromosomes, I would remove Y and MT and try again. (In general, you want to think about DNAm on the sex chromosomes due to X inactivation).
Best, Kasper
On Thu, Jun 8, 2023 at 2:05 PM Samuel Terkper Ahuno < @.***> wrote:
sorry to bother you all @kasperdanielhansen https://github.com/kasperdanielhansen @sara-sajjadi https://github.com/sara-sajjadi @PeteHaitch https://github.com/PeteHaitch Que - but how did you remove non-NA values or how did you solve it? i tried filtering out regions with less than 2x coverage in each group but that didn't work could you please provide some insights? thanks so much in advance!!!
############
filter out regions where 2 or more samples have less than 2x coverage
############ BS.cov <- getCoverage(bismark_bsseq) keepLoci.ex <- which(rowSums(BS.cov[, bismark_bsseq.fit$Type == "Tumor"] >= 2) >= 2 & rowSums(BS.cov[, bismark_bsseq.fit$Type == "Normal"] >= 2) >= 2) length(keepLoci.ex) bismark_bsseq.fit <- bismark_bsseq.fit[keepLoci.ex,]
bismark_bsseq.fit An object of type 'BSseq' with 51670328 methylation loci 4 samples has been smoothed with BSmooth (ns = 70, h = 1000, maxGap = 100000000) Some assays are HDF5Array-backed
##############
do tstats
###############
bismark_bsseq.tstat <- BSmooth.tstat(bismark_bsseq.fit,
- group1 = c("SA123T_1","SA123T_2"),
- group2 = c("SA123N_1","SA123N_2"),
- estimate.var = "group2",
- local.correct = TRUE,
- verbose = TRUE) [BSmooth.tstat] preprocessing ... done in 27.2 sec [BSmooth.tstat] computing stats within groups ... done in 23.4 sec [BSmooth.tstat] computing stats across groups ... Error in h(simpleError(msg, call)) : error in evaluating the argument 'args' in selecting a method for function 'do.call': need at least two non-NA values to interpolate
############
check counts of chromosome level methylation, looks ok
############
table(seqnames(bismark_bsseq.fit))
1 2 3 4 5 6 7 8 9 10
4249685 4046504 3068293 2720217 2734683 2726634 2868261 2442494 2175831 2519427 11 12 13 14 15 16 17 18 19 20 2395826 2422185 1451962 1592770 1539470 2032957 2150986 1258984 1901428 1375109 21 22 X Y MT 703589 1050461 2235845 5854 873
— Reply to this email directly, view it on GitHub https://github.com/hansenlab/bsseq/issues/120#issuecomment-1583110931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF2DH3H36KFK4RPB242SZ3XKIH7PANCNFSM6AAAAAAYQ74MWY . You are receiving this because you were mentioned.Message ID: @.***>
-- Best, Kasper
thanks! I'll give it a try! -S
Hi, I'm getting the following error while computing t-statistics with BSmooth.tstat : Error in h(simpleError(msg, call)) : error in evaluating the argument 'args' in selecting a method for function 'do.call': need at least two non-NA values to interpolate. There are no NAs in my dataset.
Can you please help with troubleshooting this?
> bs.fit An object of type 'BSseq' with 24298794 methylation loci 11 samples has been smoothed with BSmooth (ns = 70, h = 1000, maxGap = 100000000) All assays are in-memory > summary(is.na(getCoverage(bs.fit, type = "M"))) 1001 1002 1003 1004 1005 1006 1007 1008 Mode :logical Mode :logical Mode :logical Mode :logical Mode :logical Mode :logical Mode :logical Mode :logical FALSE:24298794 FALSE:24298794 FALSE:24298794 FALSE:24298794 FALSE:24298794 FALSE:24298794 FALSE:24298794 FALSE:24298794 1009 1010 1011 Mode :logical Mode :logical Mode :logical FALSE:24298794 FALSE:24298794 FALSE:24298794 > BS.tstat <- BSmooth.tstat(bs.fit, + group1 = c("1002", "1003", "1005", "1006", "1007", "1008"), + group2 = c("1001", "1004", "1009", "1010", "1011"), + estimate.var = "same", + local.correct = TRUE, + mc.cores = 1, + verbose = TRUE) [BSmooth.tstat] preprocessing ... done in 23.9 sec [BSmooth.tstat] computing stats within groups ... done in 24.6 sec [BSmooth.tstat] computing stats across groups ... Error in h(simpleError(msg, call)) : error in evaluating the argument 'args' in selecting a method for function 'do.call': need at least two non-NA values to interpolate > sessionInfo() R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) Matrix products: default locale: [1] LC_COLLATE=English_India.utf8 LC_CTYPE=English_India.utf8 LC_MONETARY=English_India.utf8 LC_NUMERIC=C [5] LC_TIME=English_India.utf8 time zone: Asia/Calcutta tzcode source: internal attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] readxl_1.4.2 BiocParallel_1.34.2 bsseq_1.36.0 SummarizedExperiment_1.30.1 [5] Biobase_2.60.0 MatrixGenerics_1.12.0 matrixStats_0.63.0 GenomicRanges_1.52.0 [9] GenomeInfoDb_1.36.0 IRanges_2.34.0 S4Vectors_0.38.1 BiocGenerics_0.46.0 loaded via a namespace (and not attached): [1] rjson_0.2.21 rhdf5_2.44.0 lattice_0.21-8 rhdf5filters_1.12.1 vctrs_0.6.2 [6] tools_4.3.0 bitops_1.0-7 parallel_4.3.0 tibble_3.2.1 fansi_1.0.4 [11] pkgconfig_2.0.3 R.oo_1.25.0 Matrix_1.5-4 data.table_1.14.8 BSgenome_1.68.0 [16] sparseMatrixStats_1.12.0 lifecycle_1.0.3 GenomeInfoDbData_1.2.10 compiler_4.3.0 Rsamtools_2.16.0 [21] Biostrings_2.68.1 munsell_0.5.0 codetools_0.2-19 permute_0.9-7 snow_0.4-4 [26] RCurl_1.98-1.12 yaml_2.3.7 pillar_1.9.0 crayon_1.5.2 R.utils_2.12.2 [31] DelayedArray_0.26.3 limma_3.56.1 gtools_3.9.4 locfit_1.5-9.7 restfulr_0.0.15 [36] grid_4.3.0 colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3 S4Arrays_1.0.4 [41] XML_3.99-0.14 utf8_1.2.3 DelayedMatrixStats_1.22.0 scales_1.2.1 XVector_0.40.0 [46] cellranger_1.1.0 R.methodsS3_1.8.2 HDF5Array_1.28.1 BiocIO_1.10.0 rtracklayer_1.60.0 [51] rlang_1.1.1 Rcpp_1.0.10 glue_1.6.2 BiocManager_1.30.20 rstudioapi_0.14 [56] R6_2.5.1 Rhdf5lib_1.22.0 GenomicAlignments_1.36.0 zlibbioc_1.46.0 > BiocManager::valid() 'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for details. Replacement repositories: CRAN: http://cran.rstudio.com/ [1] TRUE
Hello, could you tell me how to combine your 11 samples together and their original format ?
@DengEr-1993 please post as a new issue with a reproducible example.
Hi, I'm getting the following error while computing t-statistics with BSmooth.tstat : Error in h(simpleError(msg, call)) : error in evaluating the argument 'args' in selecting a method for function 'do.call': need at least two non-NA values to interpolate. There are no NAs in my dataset.
Can you please help with troubleshooting this?