Closed tonydisera closed 8 months ago
This is also a bug that needs to be fixed before 4.9 can be released.
Please make sure you create a branch off of the latest 4.9 branch as much of the code has changed.
@tonydisera I checked the regular expression, it turns out that there maybe dot character in Number. The SAC is this case. Now it is fixed.
As for the genotype values are not matched to the Format Header you mentioned. For some variants, including the case you mentioned, just 6 values are matched. But in other variants, there are 8 of them. For example:
Do you think it makes sense that we keep displaying all 'Format Header' in selectVariantAnnotationDialog, but display the 'None' Value if some format headers are not matched in the Genotype Column in some variants.
This makes sense to me. We can’t guarantee that the header is accurate. Someone could have just added the header from a different vcf file, so it might have nothing to do with the actual data. We can’t validate this though, so we just assume the header is valid and show all options. So long as we’re clear that data is missing for a specific variant, this is fine
If INFO or FORMAT header rec cannot be parsed, print message to console about field being bypassed. Fix bug that was returning previous record when match failed.
For the demo exome vcf dataset, there are 8 format fields listed in the header, but 6 actual genotype values that can be parsed.
Here is the header:
But not all of these fields in the header are actually included in the genotype. Here is the format column, showing 6 fields:
And in the Select Annotations dialog: there are 7 fields listed for selection. Notice that
SAC
is not included and 2 fields (PGT
,PID
) are not available in the format (and genotype) fields.To fix, you should filter the
vcf.infoFields.FORMAT
to only include those format fields (from the header) that are available in the format column.As for the SAC field not being included, you will need to troubleshoot the code in vcf.iobio.js that creates the vcf.infoFields.FORMAT. I suspect that an error is occurring when trying to parse out the vcf header FORMAT record for the SAC field. I noticed some suspicious code in
_parseHeaderForInfoORFormat
. The field infoOrFormat is never initialized, so what happens if matches on the regular expression doesn't work? I'd try printing some console messages when matches == null to see if SAC is not getting parsed correctly. If that is the problem, then something must be wrong with the regular expression.