iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Build PRG from MSA and more sparsely from VCF #130

Open ffranr opened 6 years ago

bricoletc commented 5 years ago

Targets here:

  1. We want to be able to build a PRG from a MSA right now

  2. Long alleles might need to be collapsed down. For eg a record of TCAGA (ref) and TTACA (alt) will 'overlap' any other variation within the same region. This creates combinatorial explosion (vcf_clusterer module) or straight out ignoring (perl script). One solution is building a graph from vcf and parsing that into a prg (this will collapse the SNPs in the long record); another (non-exclusive) is to allow nesting in gramtools, such that overlapping records are no longer flattened into one.

bricoletc commented 3 years ago

Now that gramtools supports prgs made with make_prg, we need a streamlined way to build a whole-genome graph from:

From this gramtools (or make_prg?) runs make_prg on each MSA and combines the PRGs with the rest of the genome.

kdm9 commented 2 years ago

Hello folks,

I'm in the situation of needing exactly what @bricoletc describes above. Is there an approach that exists today which implements this functionality? Is this a serious plan with someone working on this functionality? If not, can I be of any assistance in making this a reality?

Best, Kevin

bricoletc commented 2 years ago

Hi @kdm9 , this is a timely question! The feature is not currently implemented in a simple way at all, I've done it via a snakemake worfklow. I will aim to implement this in gramtools. It should not be too complicated and is essential for tool usability. I estimate 2 weeks.

However, could you give me a sense of what you're trying to do? This would help make sure we're on same page and get a sense of your timeline. Feel free to drop email at bletcher@ebi.ac.uk (also let me know if #163 works for you)