mskilab-org / fragCounter

GC and mappability corrected fragment coverage for paired end whole genome sequencing
MIT License
7 stars 11 forks source link

data.table breaks fragCounter #16

Open jamesdalg opened 2 years ago

jamesdalg commented 2 years ago

I am also having this issue as https://github.com/mskilab/fragCounter/issues/7. The author may have closed it, but it's still an issue impeding many from using your great software!

Would changing the package version of data.table or other dependencies help? I'm willing to try anything.

mean cov: 39.3 per bin, estimated tot fragments: 616 million fragments, processing 93033.34 fragments/second
time elapsed: 1.84 hours, estimated time remaining: 0 hours , estimated total time 1.84 hours
Finished computing coverage, and making GRanges
Finished acquiring coverage
Loaded GC and mappability
length cov is 15685849, length gc is 240650, length map is 240650
Synced coverage, GC, and mappability
Modified gc / mappability correction
Converting to data.table
Grouping intervals
Defining 500 fold collapsed ranges
Presegmenting at  NA  bp scale
Error in `[.data.table`(cov.dt, , list(chr = seqnames[1], start = min(start),  :
  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed.
Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table
In addition: Warning messages:
1: replacing previous import ‘GenomicRanges::shift’ by ‘data.table::shift’ when loading ‘bamUtils’
2: replacing previous import ‘GenomicAlignments::second’ by ‘data.table::second’ when loading ‘bamUtils’
3: replacing previous import ‘GenomicAlignments::last’ by ‘data.table::last’ when loading ‘bamUtils’
4: replacing previous import ‘GenomicAlignments::first’ by ‘data.table::first’ when loading ‘bamUtils’
5: closing unused connection 3 (../PAMHYN_Tumor.realigned.md.bam)
6: In max(width(cov)) : no non-missing arguments to max; returning -Inf
7: In cat("Presegmenting at ", as.integer(WID * base^(numlevs)), " bp scale \n") :
  NAs introduced by coercion to integer range

Here's the sessionInfo() output within R:

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/local/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] fragCounter_0.0.0.9000

loaded via a namespace (and not attached):
 [1] DNAcopy_1.66.0              rstudioapi_0.13
 [3] XVector_0.32.0              GenomicRanges_1.44.0
 [5] BiocGenerics_0.38.0         zlibbioc_1.38.0
 [7] GenomicAlignments_1.28.0    IRanges_2.26.0
 [9] BiocParallel_1.26.2         lattice_0.20-45
[11] rjson_0.2.20                skidb_0.1
[13] GenomeInfoDb_1.28.4         tools_4.1.0
[15] grid_4.1.0                  SummarizedExperiment_1.22.0
[17] parallel_4.1.0              Biobase_2.52.0
[19] data.table_1.14.2           matrixStats_0.61.0
[21] yaml_2.2.1                  crayon_1.4.1
[23] BiocIO_1.2.0                Matrix_1.3-4
[25] GenomeInfoDbData_1.2.6      rtracklayer_1.52.1
[27] restfulr_0.0.13             gUtils_0.2.0
[29] S4Vectors_0.30.2            bitops_1.0-7
[31] RCurl_1.98-1.5              DelayedArray_0.18.0
[33] compiler_4.1.0              MatrixGenerics_1.4.3
[35] Biostrings_2.60.2           Rsamtools_2.8.0
[37] stats4_4.1.0                XML_3.99-0.8

Here's the output from the sample data:

bash-4.2$ frag -b /spin1/home/linux/dalgleishjl/R/4.1/library/fragCounter/extdata/chr21.bam -d /spin1/home/linux/dalgleishjl/R/4.1/library/fragCounter/extdata/gcMAP21/ -w 200

███████╗██████╗  █████╗  ██████╗  ██████╗ ██████╗ ██╗   ██╗███╗   ██╗████████╗███████╗██████╗
██╔════╝██╔══██╗██╔══██╗██╔════╝ ██╔════╝██╔═══██╗██║   ██║████╗  ██║╚══██╔══╝██╔════╝██╔══██╗
█████╗  ██████╔╝███████║██║  ███╗██║     ██║   ██║██║   ██║██╔██╗ ██║   ██║   █████╗  ██████╔╝
██╔══╝  ██╔══██╗██╔══██║██║   ██║██║     ██║   ██║██║   ██║██║╚██╗██║   ██║   ██╔══╝  ██╔══██╗
██║     ██║  ██║██║  ██║╚██████╔╝╚██████╗╚██████╔╝╚██████╔╝██║ ╚████║   ██║   ███████╗██║  ██║
╚═╝     ╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═════╝  ╚═════╝ ╚═╝  ╚═══╝   ╚═╝   ╚══════╝╚═╝  ╚═╝

Calling samtools view  -f 0x02 -F 0x10 /spin1/home/linux/dalgleishjl/R/4.1/library/fragCounter/extdata/chr21.bam -q 1 | cut -f "3,4,9"
Starting fragment count on /spin1/home/linux/dalgleishjl/R/4.1/library/fragCounter/extdata/chr21.bam with bin size 200 and min mapQ 1 and   insert size limit 10000 with midpoint set to TRUE
Finished computing coverage, and making GRanges
Finished acquiring coverage
Loaded GC and mappability
length cov is 15509063, length gc is 240650, length map is 240650
Synced coverage, GC, and mappability
Modified gc / mappability correction
Converting to data.table
Grouping intervals
Defining 500 fold collapsed ranges
Presegmenting at  100000  bp scale
Aggregating coverage within levels
Mono scale correction
Correcting coverage at individual scales
Correcting coverage at  200 bp scale, with 234722 intervals
Quantile filtering response and covariates
Response min quantile: 0 max quantile: 0
Nominated 0 of 234722 data points for loess fitting
Converting to GRanges
Made GRanges
Warning messages:
1: replacing previous import ‘GenomicRanges::shift’ by ‘data.table::shift’ when loading ‘bamUtils’
2: replacing previous import ‘GenomicAlignments::second’ by ‘data.table::second’ when loading ‘bamUtils’
3: replacing previous import ‘GenomicAlignments::last’ by ‘data.table::last’ when loading ‘bamUtils’
4: replacing previous import ‘GenomicAlignments::first’ by ‘data.table::first’ when loading ‘bamUtils’
5: closing unused connection 3 (/spin1/home/linux/dalgleishjl/R/4.1/library/fragCounter/extdata/chr21.bam)
6: In value[[3L]](cond) : DNACopy error moving on without segmenting
7: In FUN(as.data.frame(grs[[i]]), fields, seg = seg) :
  Not enough samples for loess fitting - check to see if missing or truncated data?

Best, James Dalgleish GCB/CCR/NCI/NIH Bethesda, MD

roseisgold commented 1 year ago

Hi!! I got the exact same error. Did you figure out a way to get around this? Thanks :)

tanasa commented 1 year ago

I am running fragCounter from the provided docker container, and I am getting almost an identical error (below). Any help, insights, suggestions would be extremely welcome. Thanks so much.

(jabba) root@8ab30b14810e:/home# frag \
> -w 200 \
> -b $FILE \
> -d ./gcmap.add.chr.hg38.chr21.rds/

███████╗██████╗  █████╗  ██████╗  ██████╗ ██████╗ ██╗   ██╗███╗   ██╗████████╗███████╗██████╗
██╔════╝██╔══██╗██╔══██╗██╔════╝ ██╔════╝██╔═══██╗██║   ██║████╗  ██║╚══██╔══╝██╔════╝██╔══██╗
█████╗  ██████╔╝███████║██║  ███╗██║     ██║   ██║██║   ██║██╔██╗ ██║   ██║   █████╗  ██████╔╝
██╔══╝  ██╔══██╗██╔══██║██║   ██║██║     ██║   ██║██║   ██║██║╚██╗██║   ██║   ██╔══╝  ██╔══██╗
██║     ██║  ██║██║  ██║╚██████╔╝╚██████╗╚██████╔╝╚██████╔╝██║ ╚████║   ██║   ███████╗██║  ██║
╚═╝     ╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═════╝  ╚═════╝ ╚═╝  ╚═══╝   ╚═╝   ╚══════╝╚═╝  ╚═╝

Calling samtools view  -f 0x02 -F 0x10 EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam -q 1 | cut -f "3,4,9"
Starting fragment count on EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam with bin size 200 and min mapQ 1 and   insert size limit 10000 with midpoint set to TRUE
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 1 num fragments processed 1e+06
mean cov: 0.1 per bin, estimated tot fragments: 1.15 million fragments, processing 13223.39 fragments/second
time elapsed: 0.02 hours, estimated time remaining: 0 hours , estimated total time 0.02 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 2 num fragments processed 2e+06
mean cov: 0.1 per bin, estimated tot fragments: 2.3 million fragments, processing 21926.43 fragments/second
time elapsed: 0.03 hours, estimated time remaining: 0 hours , estimated total time 0.03 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 3 num fragments processed 3e+06
mean cov: 0.2 per bin, estimated tot fragments: 3.45 million fragments, processing 28520.47 fragments/second
time elapsed: 0.03 hours, estimated time remaining: 0 hours , estimated total time 0.03 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 4 num fragments processed 4e+06
mean cov: 0.3 per bin, estimated tot fragments: 4.59 million fragments, processing 33402.85 fragments/second
time elapsed: 0.03 hours, estimated time remaining: 0 hours , estimated total time 0.04 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 5 num fragments processed 5e+06
mean cov: 0.4 per bin, estimated tot fragments: 5.72 million fragments, processing 37222.87 fragments/second
time elapsed: 0.04 hours, estimated time remaining: 0.01 hours , estimated total time 0.04 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 6 num fragments processed 6e+06
mean cov: 0.4 per bin, estimated tot fragments: 6.86 million fragments, processing 40169.78 fragments/second
time elapsed: 0.04 hours, estimated time remaining: 0.01 hours , estimated total time 0.05 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 7 num fragments processed 7e+06
mean cov: 0.5 per bin, estimated tot fragments: 7.99 million fragments, processing 42764.8 fragments/second
time elapsed: 0.05 hours, estimated time remaining: 0.01 hours , estimated total time 0.05 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 8 num fragments processed 8e+06
mean cov: 0.6 per bin, estimated tot fragments: 9.13 million fragments, processing 45021.47 fragments/second
time elapsed: 0.05 hours, estimated time remaining: 0.01 hours , estimated total time 0.06 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 9 num fragments processed 9e+06
mean cov: 0.6 per bin, estimated tot fragments: 10.26 million fragments, processing 46759.96 fragments/second
time elapsed: 0.05 hours, estimated time remaining: 0.01 hours , estimated total time 0.06 hours
bam.cov.tile.st  EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam chunk 10 num fragments processed 1e+07
mean cov: 0.7 per bin, estimated tot fragments: 11.39 million fragments, processing 51019.69 fragments/second
time elapsed: 0.05 hours, estimated time remaining: 0.01 hours , estimated total time 0.06 hours
Finished computing coverage, and making GRanges
Finished acquiring coverage
Loaded GC and mappability
length cov is 16088445, length gc is 233550, length map is 233550
Synced coverage, GC, and mappability
Modified gc / mappability correction
Converting to data.table
Grouping intervals
Defining 500 fold collapsed ranges
Presegmenting at  NA  bp scale
Error in `[.data.table`(cov.dt, , list(chr = seqnames[1], start = min(start),  :
  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(get(paste("lev", numlevs, sep = ""))) should work. This is for efficiency so data.table can detect which columns are needed.
Calls: fragCounter -> multicoco -> seg2gr -> is -> [ -> [.data.table
In addition: Warning messages:
1: replacing previous import 'GenomicRanges::shift' by 'data.table::shift' when loading 'bamUtils'
2: replacing previous import 'GenomicAlignments::second' by 'data.table::second' when loading 'bamUtils'
3: replacing previous import 'GenomicAlignments::last' by 'data.table::last' when loading 'bamUtils'
4: replacing previous import 'GenomicAlignments::first' by 'data.table::first' when loading 'bamUtils'
5: closing unused connection 3 (EA5040545.sorted.dedup.realigned.recal.hg38.chr21.bam)
6: In bamUtils::bam.cov.tile(bam, window = window, reference = reference,  :
  strange data.table "by" fail .. using rbindlist lapply instead as alternative
7: In max(width(cov)) : no non-missing arguments to max; returning -Inf
8: In cat("Presegmenting at ", as.integer(WID * base^(numlevs)), " bp scale \n") :
  NAs introduced by coercion to integer range
Execution halted

The information about the R session running in the container :

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS/LAPACK: /root/miniconda3/envs/jabba/lib/libopenblasp-r0.3.21.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.3