harrispopgen / mutyper

Ancestral k-mer mutation types for SNP data
https://harrispopgen.github.io/mutyper/
MIT License
7 stars 3 forks source link

Integration with Hail #41

Open vladsavelyev opened 1 year ago

vladsavelyev commented 1 year ago

I wonder if you happened to see Hail Query, a framework that for processing variant data in scale. It was used e.g. to build the gnomAD resource. It would be cool to support Hail matrix table in mutyper, or to contribute the mutyper methods to the Hail codebase itself (e.g. as part of https://hail.is/docs/0.2/methods/genetics.html#genetics)

Thanks for this amazing and well-implemented tool!

https://github.com/openjournals/joss-reviews/issues/5227

WSDeWitt commented 1 year ago

Thanks for the positive assessment, and for pointing us to Hail—it looks like a very nice framework for computational genomics workflows.

Mutyper is currently built on top of cyvcf2 (for VCF processing) and pyfaidx (for FASTA processing), which work well for CLI streaming and random access with minimal memory requirements. These are both widely used packages, with hundreds of dependent repos or packages (see here and here).

We are open to including Hail interoperability on the roadmap for mutyper, but would likely need contributions from Hail experts for this, as well as an assessment of how this would improve, replace, or augment the current CLI and API based on cyvcf2 and pyfaidx.

Would you find it satisfactory if we leave this issue open with an "enhancement" tag for now?