Open mparker2 opened 2 months ago
Hiya,
nice to see the VCF filtering (mostly) works!
The reason HDR regions break coresyn is that they do not have corresponding SYNAL
annotations, which is what (the current iteration of) msyd works on so that we can have exact alignments with basepair precision.
I think the HDRs might be retained because they start right before/after a coresyn region, and we fetch all variants intersecting a multisyn (end-inclusive) for reporting in the merged VCF.
If you just want the snps, not passing -x/--complex
should filter out any VCF records with symbolic alleles incl. HDRs.
Other than that, grep -v HDR
would be a quick workaround. Might be worth adding a CLI option to restrict merging to records strictly within a multisyn, though.
That makes sense. Using --complex
is not a foolproof solution because syri is able to create VCFs with both symbolic or full sequence alleles (using --hdrseq
)
Ah, true. Then I'll look into adding a CLI option for strictly contained records. I think filtering for specific types of records is probablybest left to the user, though.
Hi @lrauschning,
when I run
msyd call
in--core
mode on some potato haplotypes, I get a nice PFF file of the coresyn regions and a merged VCF for the same coresyn. The SNPs and indels which do not overlap coresyn regions are nicely filtered out, which is exactly what I want.Msyd breaks the coresyn regions on HDRs. I don't know if @mnshgl0110 would agree with this, since HDRs are considered as part of a syntenic region by syri, but it works quite nicely for me. I want to get rid of them for my current analysis. However, they are not filtered out in the VCF. This doesn't seem quite correct... imo either the PFF should include the HDR regions as coresyn (and the VCF include them), or they should be filtered out of the VCF.
What is your opinion?