vplagnol / ExomeDepth

ExomeDepth R package for the detection of copy number variants in exomes and gene panels using high throughput DNA sequencing data.
63 stars 26 forks source link

Error in getBamCounts when adding a reference Fasta #30

Closed ftucos closed 4 years ago

ftucos commented 4 years ago

Whenever I try to add a reference fasta file for computing the GC content, i get an error like this Reference fasta file provided so ExomeDepth will compute the GC content in each window Error in value[[3L]](cond) : record 38397 (chr11:134201957-134202041) failed file: data/hg19.fa I have tried both with the UCSC and Ensembl build but I get the same error on a different chromosome.

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.38.0 GenomeInfoDb_1.22.1  IRanges_2.20.2       S4Vectors_0.24.4     BiocGenerics_0.32.0 
[6] ExomeDepth_1.1.15    readr_1.3.1         
ftucos commented 4 years ago

I figured out it was an overflow problem that R enconters both generating the index file (fa.fai) and parsing the genome. I had to generate the index in bash with samtools and get the GC content of each exonic sequence with bedtools $ bedtools nuc -fi hg19.fa -bed S07604514_hs_hg19/S07604514_Covered_edit.bed | awk '{$4=""; print $0}' > AgilentV6_exons_GC_count.txt Than I manually added the GC contents columnt to the dataframe with the exome reads.

vplagnol commented 4 years ago

Thank you for figuring it out!

manburst commented 1 year ago

Sir I got the same problem, how did you manually added the GC contents column after calculate GC content by bedtools nuc -fi hg19.fa -bed S07604514_hs_hg19/S07604514_Covered_edit.bed | awk '{$4=""; print $0}' > AgilentV6_exons_GC_count.txt ? my.counts = getBamCounts(bed.frame=segments, bam.files=BAMFiles, include.chr=FALSE, referenceFasta='hg38.fa')

Thanks in advance