Closed bricoletc closed 4 years ago
yes, drop non-pass records
I am wondering if the responsibility of building PRGs from whatever source (MSAs, VCFs, whatever other input format) should be moved from the tools (gramtools, pandora, etc) to the make_prg repo. I will restart working on make_prg
next week, as it is the memory bottleneck in the pandora denovo pipeline in some cases, but as we are focusing on pandora paper, I will just have in mind taking the input as MSAs.
A bad way to make it accept VCFs in the new make_prg
implementation would be transforming VCFs to MSAs (I guess this is doable) before running it.
Where do you think we should attack this problem?
BTW, new make_prg
implementation should accept VCFs directly as soon as possible, but for this first implementation, only MSAs
I agree make_prg
should be its own library because pandora
and gramtools
both need it.
I have made a Python utility for VCF to PRG string conversion which is gramtools specific for now (0-level nesting).
My only concern with VCF to MSA and then MSA to PRG string, is if you ask for nesting level 0, what do you get? No clustering happens, and the module juts enumerates all the alternatives at each variant site? Hoping you get the same as my python utility
cluster_vcf_records
by default in build
so we stop ignoring overlapping recordsbuild
We have one way of building PRG right now from a vcf. This is perl script https://github.com/iqbal-lab-org/gramtools/blob/master/gramtools/utils/vcf_to_linear_prg.pl . With following caveats:
The first of these issues can be dealt with by using
vcf_clusterer
module on input vcf, and runningbuild
on the output of that.