telatin / bamtocov

🏔 coverage extraction from BAM/CRAM files, supporting targets 📊  
https://telatin.github.io/bamtocov/
MIT License
59 stars 6 forks source link

A way to calculate coverage breadth #8

Open BharatRaviIyengar opened 2 years ago

BharatRaviIyengar commented 2 years ago

It is often useful to see if reads horizontally cover a certain percentage of a locus' length. Existing tools (bedtools coverage) are too slow/memory-consuming with unsorted BAMs.

telatin commented 2 years ago

Hi @BharatRaviIyengar, thanks for the feedback. Can you provide a minimal example of the desired output and your bedtools coverage command to be able to test how that would work with BamToCov? Thanks

BharatRaviIyengar commented 2 years ago

@telatin Thank you for getting back. The desired output would be a tab separated file with:

  1. locus name
  2. locus start (optional)
  3. locus stop (optional)
  4. orientation (+/-)
  5. coverage breadth (what % of the locus is covered by seq reads)
  6. average coverage depth (that BamToCov already reports).

Input files would be a GTF/BED and a SAM/BAM

You can see how the bedtools coverage output looks like here. I am still pasting one of the example outputs

$ cat A.bed
chr1  0   100 b1  1  +
chr1  100 200 b2  1  -
chr2  0   100 b3  1  +

$ cat B.bed
chr1  10  20  a1  1  -
chr1  20  30  a2  1  -
chr1  30  40  a3  1  -
chr1  100 200 a4  1  +

$ bedtools coverage -a A.bed -b B.bed
chr1  0   100 b1  1  +  3  30  100  0.3000000
chr1  100 200 b2  1  -  1  100 100  1.0000000
chr2  0   100 b3  1  +  0  0   100  0.0000000

$ bedtools coverage -a A.bed -b B.bed -s
chr1  0   100 b1  1  +  0  0   100  0.0000000
chr1  100 200 b2  1  -  0  0   100  0.0000000
chr2  0   100 b3  1  +  0  0   100  0.0000000

My File-A (-a) is a GTF and File-B (-b) is a BAM