vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

Allow sorting and indexing GAMs by name #1815

Open adamnovak opened 6 years ago

adamnovak commented 6 years ago

The vg gamsort tool should be extended to sort GAM files by name. The GAI index format and API should be extended to support seeking into name-sorted files, for efficient by-name search.

This will be very useful when examining mismapped reads in large read datasets.

ruolin commented 3 years ago

Any update on this? I have a use case where I want to chunk a large gam to look at a small region by a small gam. However, when I vg surject the small gam to bam, they are treated as single reads in the bam since the sam gam is not sorted by name.

glennhickey commented 3 years ago

One thing you can try is converting to GAF with vg convert -G. That's a text format that you should be able to sort by any field(s) with just the unix sort command.