varfish-org / varfish-server

VarFish: comprehensive DNA variant analysis for diagnostics and research
MIT License
43 stars 11 forks source link

Handling of trio from single-called VCF #526

Closed xiamaz closed 7 months ago

xiamaz commented 2 years ago

Is your feature request related to a problem? Please describe. Currently varfish expects jointly called VCF for Trio analysis. If single-called VCFs are used to create a Trio case, some filters do not work as expected, which can lead to unexpected behavior for the interpreter, who might not know, how a Trio has been called.

For example, currently filtering by de-novo does not work, since a large number of variants will be '.' instead of '0' if they are REF in the parents. This will cause variants to be missing, that should otherwise appear in de-novo filtration.

Describe the solution you'd like For the user this issue should be transparent (eg they shouldn't need to know how calling happened). If de-novo filter is used, a user expects any variant to be shown, that is not in either parent.

Describe alternatives you've considered Joint calling samples. Depending on future users, this might not always be an easy option.

Additional context Franklin somehow correctly handles separately called VCFs for Trio analysis, but their filtration strategy is opaque.

xiamaz commented 2 years ago

Some issues

On vcf level, it is not apparent whether variants are missing because of lacking coverage or because they were REF. This might lead to some number of false-positives. Whether this is a relevant issue in normal analysis remains to be seen.

@stolpeo Could you please expand on this issue? Thanks

holtgrewe commented 2 years ago

@xiamaz There are at least three ways to generate multi sample vcfs.

  1. Joint calling. Here wild type is 0/0
  2. GATK gVCF workflow which will lead to 0/0 for wild type.
  3. Merging from single VCFs which may lead to no-call (./.) for wild type as this would not be called for parents in de novo variants.

There are tools such as bcftools which allows you to replace no-call with wild type. This will be what you want for targets in WES. But for off target you may not want this...

So the semantics for VCF transmogrification are not 100% obvious. IMO the right way to handle this would be on the VCF level that is before import into VarFish. That is if Varfish is for filtration and variant data delivery as it is currently designed to be.

Maybe we want to have an "admin" or "owner only" data transmogrification toolbox? This may include a "no-call to wild type" tool. When we include a library of enrichment regions this could be performed for ontarget regions only.

Alternatively, we could allow annotating/tagging cases as coming from workflow (3) above and then de novo semantics can be different. This may come at the cost of more artifacts or unexpected behavior if the child is sequenced with other enrichment as the parents (as may be if sequenced earlier than the parents or resequenced later on...)

xiamaz commented 2 years ago

@holtgrewe Thanks for the input and the explanation of the issue!

varfish itself might not be the correct place to handle this, but possibly the documentation, specifically the https://varfish.bihealth.org/manual/admin_ingest.html could benefit from a section on possible pitfalls in trio import?