harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

fix issue with trackhub #210

Closed erikenbody closed 1 week ago

erikenbody commented 1 week ago

This was a doozy to track down, but I have been having issues with bed sorting for plant species with chloroplasts when building track hubs. Similar to issues here.

e.g. here's the end of a sorted.chrom.sizes that cause issues because it is sorted differently depending on if you do case-sorting or not:

SCAF_100        30392
SCAF_101        20804
SCAF_102        18621
SCAF_103        12751
chloroplast     155214

The solution is to case sort before running bedtools complement. Then sort the non callable sites file using bedSort (for whatever reason relating to locale, sort doesn't work for this). This seems most robust to variation on different linux configurations of locale.