tjparnell / biotoolbox

Tools for querying and analysis of genomic data
http://tjparnell.github.io/biotoolbox/
Artistic License 2.0
27 stars 16 forks source link

[feature request] allow reading SAM data from STDIN in bam2wig.pl #7

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Could you please add an option of reading SAM or BAM file from standard input?
This would help running bam2wig.pl at the end of shell pipe. Bedtools and 
Samtools are able to work in pipes. Why not bam2wig.pl?

Being able to run something like "somecommand | samtools view -Sb - | 
bam2wig.pl --coverage --in -" would be great.

Original issue reported on code.google.com by balwi...@gmail.com on 16 Dec 2014 at 6:53

GoogleCodeExporter commented 9 years ago
Thanks for submitting the request. While this seems like a reasonable approach, 
there are multiple issues that prevent this from working:

1. The alignments need to be sorted by genomic order, and there is no guarantee 
that they would be.
2. There are some steps that are done prior to generating the coverage, 
specifically pre-counting valid alignments for RPM transformation and 
calculating 3' shift values in ChIPSeq data.
3. While Perl is a robust and flexible language, it is not particularly speedy, 
so to eke out performance bam2wig can fork itself into separate processes, one 
for each chromosome. True multi-threading on a stream would be a headache and 
potentially not as fast.
4. The Perl API I am using, Bio::DB::Sam, doesn't natively work with streams. 
Instead, it opens its own file handles using the bai index to work with 
separate chromosomes.
5. Even working with text Sam files, something I would like to do, is a pain 
and not very convenient or efficient compared to working with Bam.
6. Some (most?) of these problems could be solved by slurping all the 
alignments into memory, which is what some other programs do, but I haven't 
favored that due to the complexity and extreme memory requirements.

Original comment by parnell...@gmail.com on 18 Dec 2014 at 4:33