iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Infer command #94

Closed ffranr closed 6 years ago

ffranr commented 6 years ago

Given coverage information (results from the quasimap command), infer the best path(s) through the PRG. Generate a single path for a haploid and two paths for a diploid.

For the first implementation, only support haploid. Use Martin's Poisson model work to select the highest likelihood alleles for each site.

Example commands: gramtools infer --haploid --error-rate 0.01 --mean-depth 20 --output fasta_file_path

ffranr commented 6 years ago

Zam said this about handling diploids:

[Alex Dilthey] suggested a very nice and simple solution to our question of what to do post quasimapping with diploid species Basically the output of quasimap can be turned into a VCF (eg by Minos) and then fed into whatsHap, which will phase them (infer 2 paths). We could then pull out the two paths, as fasta, and map to them using standard tools We'd get new variants to add to the graph that way, AND for free get the 2 paths

But also, it leaves a clean interface where people can apply other phasing techniques for species (eg MHC) where you have a lot of background info, you can use statistical methods based on population frequencies to phase stuff But this would give uis a clean default that would work for every use case

ffranr commented 6 years ago

From my understanding, the above translates into the following additional possible uses of the infer command: gramtools infer --diploid --phase=whatsHap --output fasta_file_path

We could easily provide a Python API to make it easier for different phasing solutions to be used. I'm not sure what parameters whatsHap will require.

The above post also makes some additional points which relate to the discover command (could also be called add-variants).

ffranr commented 6 years ago

Leaving open as a reminder that a VCF format output option needs to be added to the infer command.

ffranr commented 6 years ago

VCF output support merged into master here: 58608bf85172bd57ff6c3a0c5ccc384421703b93 This output format is only supported if the build command used a VCF and reference to build the PRG. The perl script generated VCF is used as a VCF template.

I will close this issue now. It's getting a bit long. I would prefer to open new issues if necessary.