Closed ffranr closed 6 years ago
Zam said this about handling diploids:
[Alex Dilthey] suggested a very nice and simple solution to our question of what to do post quasimapping with diploid species Basically the output of quasimap can be turned into a VCF (eg by Minos) and then fed into whatsHap, which will phase them (infer 2 paths). We could then pull out the two paths, as fasta, and map to them using standard tools We'd get new variants to add to the graph that way, AND for free get the 2 paths
But also, it leaves a clean interface where people can apply other phasing techniques for species (eg MHC) where you have a lot of background info, you can use statistical methods based on population frequencies to phase stuff But this would give uis a clean default that would work for every use case
From my understanding, the above translates into the following additional possible uses of the infer
command:
gramtools infer --diploid --phase=whatsHap --output fasta_file_path
We could easily provide a Python API to make it easier for different phasing solutions to be used. I'm not sure what parameters whatsHap will require.
The above post also makes some additional points which relate to the discover
command (could also be called add-variants
).
Leaving open as a reminder that a VCF format output option needs to be added to the infer command.
VCF output support merged into master here: 58608bf85172bd57ff6c3a0c5ccc384421703b93
This output format is only supported if the build
command used a VCF and reference to build the PRG. The perl script generated VCF is used as a VCF template.
I will close this issue now. It's getting a bit long. I would prefer to open new issues if necessary.
Given coverage information (results from the
quasimap
command), infer the best path(s) through the PRG. Generate a single path for a haploid and two paths for a diploid.For the first implementation, only support haploid. Use Martin's Poisson model work to select the highest likelihood alleles for each site.
Example commands:
gramtools infer --haploid --error-rate 0.01 --mean-depth 20 --output fasta_file_path