Open jakewendt opened 1 year ago
More undefined FILTERs in reference files.
bcftools view 1000GP.GRCh38_3202.ME_absences.ALL.vcf.gz | grep -vs "^#" | cut -f7 | sort | uniq -c
[W::bcf_hrec_check] Invalid tag name: "0START"
[W::bcf_hrec_check] Invalid tag name: "1END"
[W::vcf_parse_filter] FILTER 'D' is not defined in the header
[W::vcf_parse_filter] FILTER '3' is not defined in the header
1 3
1 3;D;M
1 3;M
34 D
584 D;M
3 D;S
17 D;S;M
701 M
2966 PASS
1 S
5 S;M
bcftools view 1000GP.GRCh38_3202.ME_insertions.ALL.vcf.gz | grep -vs "^#" | cut -f7 | sort | uniq -c
[W::bcf_hrec_check] Invalid tag name: "0START"
[W::bcf_hrec_check] Invalid tag name: "1END"
[W::vcf_parse_filter] FILTER 'D' is not defined in the header
[W::vcf_parse_filter] FILTER 'SD' is not defined in the header
135 D
266 D;M
1152 LC
15 LC;D
223 LC;D;M
2560 LC;M
13 LC;NU
5 LC;NU;D
23 LC;NU;D;M
110 LC;NU;M
1 LC;NU;SD
4 LC;NU;S;M
1 LC;NU;S;SD;M
1 LC;NU;S;S;M
86 LC;S
1 LC;SD
1 LC;S;D
14 LC;S;D;M
6 LC;SD;M
224 LC;S;M
1 LC;S;S;M
4891 M
285 NU
42 NU;D
47 NU;D;M
114 NU;M
5 NU;S
7 NU;SD
4 NU;SD;M
13 NU;S;M
46333 PASS
6 S
14 SD
21 SD;M
25 S;M
Are these simply missing definition in the header or typos in the samples that use them?
Still guessing that the SD
should be S;D
. Also that the D
and 3
filter definitions are simple missing as they are in other VCFs.
3
and D
is only in absences and missing in the insertions. This could clearly be repaired by adding the missing definition if needed.
zgrep "^##FILTER=<ID=3" *.vcf.gz
1000GP.GRCh37.ME_absences.ALL.vcf.gz:##FILTER=<ID=3,Description="Potential 3' transduction">
1000GP.GRCh37.ME_absences.PASS.vcf.gz:##FILTER=<ID=3,Description="Potential 3' transduction">
1000GP.GRCh38_2504.ME_absences.ALL.vcf.gz:##FILTER=<ID=3,Description="Potential 3' transduction">
1000GP.GRCh38_2504.ME_absences.PASS.vcf.gz:##FILTER=<ID=3,Description="Potential 3' transduction">
zgrep "^##FILTER=<ID=D" *.vcf.gz
1000GP.GRCh37.ME_absences.ALL.vcf.gz:##FILTER=<ID=D,Description="Relative depth of breakpoint is outlier">
1000GP.GRCh37.ME_absences.PASS.vcf.gz:##FILTER=<ID=D,Description="Relative depth of breakpoint is outlier">
1000GP.GRCh38_2504.ME_absences.ALL.vcf.gz:##FILTER=<ID=D,Description="Relative depth of breakpoint is outlier">
1000GP.GRCh38_2504.ME_absences.PASS.vcf.gz:##FILTER=<ID=D,Description="Relative depth of breakpoint is outlier">
While S
is in all files, some have it defined twice and different on occasion. This is more problematic.
zgrep "^##FILTER=<ID=S" *.vcf.gz
1000GP.GRCh37.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Spanning read num is outlier">
1000GP.GRCh37.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Spanning read num is outlier">
1000GP.GRCh37.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Spanning read num is outlier">
1000GP.GRCh38_2504.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Spanning read num is outlier">
1000GP.GRCh38_2504.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_3202.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_3202.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_3202.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_3202.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
Hi, when I used the bcftools to select some samples results from the vcf, I also got the error. [W::vcf_parse] FILTER 'SD' is not defined in the header
So, I can just ignore it?
I don't recall how I dealt with this, or even if I did.
I'm guessing that somehow S and D filters were somehow merged when writing the VCFs.
So these
SD
should beS;D
?