harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

sort/gather step fails with high interval count #87

Open tsackton opened 1 year ago

tsackton commented 1 year ago

Testing on a dataset with ~12k intervals: the sort/gather step at the end (to collect each genotyped interval into the final vcf) fails. It is not clear if this is a temp file issue, a memory issue, or just a problem with the way we handle very large inputs to bcf concat.

Probably the solution is multiple rounds of merging, which we can investigate. For now, posting this issue in case other people have problems with datasets with >10k intervals to merge.

tsackton commented 3 months ago

Possible solution: