shohei-kojima / MEGAnE

MEGAnE
MIT License
24 stars 3 forks source link

FILTER 'SD' is not defined in the header #10

Open jakewendt opened 1 year ago

jakewendt commented 1 year ago
bcftools view --no-header ${vcf} | cut -f7 | sort | uniq -c
[W::bcf_hrec_check] Invalid tag name: "0START"
[W::bcf_hrec_check] Invalid tag name: "0END"
[W::vcf_parse_filter] FILTER 'SD' is not defined in the header
    248 D
     54 D;M
    325 LC
     50 LC;D
     40 LC;D;M
      9 LC;F
      3 LC;F;D;M
     38 LC;F;M
    419 LC;M
      3 LC;SD
     33 LC;SD;M
     12 LC;S;M
   1403 M
   6761 PASS
     35 SD
     65 SD;M
     15 S;M

I'm guessing that somehow S and D filters were somehow merged when writing the VCFs.

So these SD should be S;D?

jakewendt commented 1 year ago

More undefined FILTERs in reference files.

bcftools view 1000GP.GRCh38_3202.ME_absences.ALL.vcf.gz | grep -vs "^#" | cut -f7 | sort | uniq -c
[W::bcf_hrec_check] Invalid tag name: "0START"
[W::bcf_hrec_check] Invalid tag name: "1END"
[W::vcf_parse_filter] FILTER 'D' is not defined in the header
[W::vcf_parse_filter] FILTER '3' is not defined in the header
      1 3
      1 3;D;M
      1 3;M
     34 D
    584 D;M
      3 D;S
     17 D;S;M
    701 M
   2966 PASS
      1 S
      5 S;M

bcftools view 1000GP.GRCh38_3202.ME_insertions.ALL.vcf.gz | grep -vs "^#" | cut -f7 | sort | uniq -c
[W::bcf_hrec_check] Invalid tag name: "0START"
[W::bcf_hrec_check] Invalid tag name: "1END"
[W::vcf_parse_filter] FILTER 'D' is not defined in the header
[W::vcf_parse_filter] FILTER 'SD' is not defined in the header
    135 D
    266 D;M
   1152 LC
     15 LC;D
    223 LC;D;M
   2560 LC;M
     13 LC;NU
      5 LC;NU;D
     23 LC;NU;D;M
    110 LC;NU;M
      1 LC;NU;SD
      4 LC;NU;S;M
      1 LC;NU;S;SD;M
      1 LC;NU;S;S;M
     86 LC;S
      1 LC;SD
      1 LC;S;D
     14 LC;S;D;M
      6 LC;SD;M
    224 LC;S;M
      1 LC;S;S;M
   4891 M
    285 NU
     42 NU;D
     47 NU;D;M
    114 NU;M
      5 NU;S
      7 NU;SD
      4 NU;SD;M
     13 NU;S;M
  46333 PASS
      6 S
     14 SD
     21 SD;M
     25 S;M

Are these simply missing definition in the header or typos in the samples that use them?

jakewendt commented 1 year ago

Still guessing that the SD should be S;D. Also that the D and 3 filter definitions are simple missing as they are in other VCFs.

jakewendt commented 1 year ago

3 and D is only in absences and missing in the insertions. This could clearly be repaired by adding the missing definition if needed.

zgrep "^##FILTER=<ID=3" *.vcf.gz 
1000GP.GRCh37.ME_absences.ALL.vcf.gz:##FILTER=<ID=3,Description="Potential 3' transduction">
1000GP.GRCh37.ME_absences.PASS.vcf.gz:##FILTER=<ID=3,Description="Potential 3' transduction">
1000GP.GRCh38_2504.ME_absences.ALL.vcf.gz:##FILTER=<ID=3,Description="Potential 3' transduction">
1000GP.GRCh38_2504.ME_absences.PASS.vcf.gz:##FILTER=<ID=3,Description="Potential 3' transduction">

zgrep "^##FILTER=<ID=D" *.vcf.gz 
1000GP.GRCh37.ME_absences.ALL.vcf.gz:##FILTER=<ID=D,Description="Relative depth of breakpoint is outlier">
1000GP.GRCh37.ME_absences.PASS.vcf.gz:##FILTER=<ID=D,Description="Relative depth of breakpoint is outlier">
1000GP.GRCh38_2504.ME_absences.ALL.vcf.gz:##FILTER=<ID=D,Description="Relative depth of breakpoint is outlier">
1000GP.GRCh38_2504.ME_absences.PASS.vcf.gz:##FILTER=<ID=D,Description="Relative depth of breakpoint is outlier">

While S is in all files, some have it defined twice and different on occasion. This is more problematic.

zgrep "^##FILTER=<ID=S" *.vcf.gz 
1000GP.GRCh37.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Spanning read num is outlier">
1000GP.GRCh37.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Spanning read num is outlier">
1000GP.GRCh37.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh37.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Spanning read num is outlier">
1000GP.GRCh38_2504.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Spanning read num is outlier">
1000GP.GRCh38_2504.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_2504.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_3202.ME_absences.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_3202.ME_absences.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_3202.ME_insertions.ALL.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
1000GP.GRCh38_3202.ME_insertions.PASS.vcf.gz:##FILTER=<ID=S,Description="Shorter than 50-bp">
jingydz commented 4 months ago

Hi, when I used the bcftools to select some samples results from the vcf, I also got the error. [W::vcf_parse] FILTER 'SD' is not defined in the header

So, I can just ignore it?

jakewendt commented 4 months ago

I don't recall how I dealt with this, or even if I did.