pappewaio / AllelicImbalance

0 stars 0 forks source link

Suggestion #1

Closed mfazel closed 1 year ago

mfazel commented 4 years ago

Hi, I came across your package a couple days ago by searching and found your newly released vignettes. Since I'm working on a project involving calculation of allelic imbalance, thought why not to give it a shot. There was just a confusion for me.
Based on vignettes, one can start with bam files, create an ASEset object and move on. It also says one can start with bcf or vcf files (assuming I already have bcf files created by other tools) but in the same step it needs "reads" object which should be created with AllelicImbalance package from bam files again, that is weird or I didn't get it. The problem is, not all bam files are toy examples and can be huge and the software loads it all into the memory at once, causing crash in R session if computer doesn't have more memory than the file size. Anyways, starting with bam files or not, it would be nice if you could incorporate a mechanism like in Rsamtools that reads certain number of reads each time not the whole file. For example: BamFileList(bam_files, yieldSize=100000) Please advice me if I didn't get it right.

Thanks, Mehdi

pappewaio commented 4 years ago

Hi, I guess you are referring to this part of the vignette:

gr <- granges(VCF)
#only use bi-allelic positions
gr.filt <- gr[width(mcols(gr)[,"REF"]) == 1 |
unlist(lapply(mcols(gr)[,"ALT"],width))==1]
countList <- getAlleleCounts(reads, gr.filt, verbose=FALSE)
a.vcf <- ASEsetFromCountList(rowRanges = gr, countList)

It is a nice suggestion and I can try to make it happen next time I update the package. Until then I suggest that you select only one gene at the time and then concatenate the ASEset Class objects, see the searchArea object below:

searchArea <- GRanges(seqnames = c("17"), ranges = IRanges(79478301, 79478361))
pathToFiles <- system.file("extdata/ERP000101_subset",
package = "AllelicImbalance")
reads <- impBamGAL(pathToFiles, searchArea, verbose = FALSE)

The concatenation is simply done like for normal vectors (here a1, a2, a3 are supposed to be different regions/genes)

a.combined <- c(a1.vcf, a2.vcf, a3.vcf)

I hope it helps, Jesper