quinlan-lab / vcf2db

create a gemini-compatible database from a VCF
MIT License
55 stars 13 forks source link

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' #20

Closed davemcg closed 7 years ago

davemcg commented 7 years ago

After annotating my VCF with vcfanno, I'm getting this error from vcf2db (latest version).

mcgaugheyd@cyclops:~/ccgo$ python ~/git/vcf2db/vcf2db.py CCGO.b37.bwa-mem.hardFilterSNP-INDEL.VEP.GRCh37.2016-12-12.anno.vcf.gz  ~/git/NGS_db/master.ped CCGO.2016-12-12.db
skipping 'AC' because it has Number=A
skipping 'AF' because it has Number=A
skipping 'MLEAC' because it has Number=A
skipping 'MLEAF' because it has Number=A
setting common_pathogenic to Type String because it has Number=.
pedigree warning: 'CCGO_800521' is dad but has female sex
not in VCF: CCGO_800067and70-Dad,CCGO_800067and70-Mom,W23-1,W23-2,W23-3,1202,1203,1204,1046,1265,1045,1264,1277A,1278,1279,1313,1414,1415,1314,1315,1417,1418,1420,1419,1316,1335,1232,1233,LV111315,1238,1484,1522,1501,LB_COL005.1,LB_COL005.4,LB_COL005.10,LB_COL034.1,LB_COL034.3,LB_COL034.2,NA12878
/home/mcgaugheyd/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py:185: SAWarning: Unicode type received non-unicode bind param value 'CCGO_FAM_800016and18'. (this warning may be suppressed after 10 occurrences)
  (util.ellipses_string(value),))
/home/mcgaugheyd/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py:185: SAWarning: Unicode type received non-unicode bind param value '2'. (this warning may be suppressed after 10 occurrences)
  (util.ellipses_string(value),))
/home/mcgaugheyd/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py:185: SAWarning: Unicode type received non-unicode bind param value '1'. (this warning may be suppressed after 10 occurrences)
  (util.ellipses_string(value),))
/home/mcgaugheyd/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py:185: SAWarning: Unicode type received non-unicode bind param value 'CCGO_FAM_800062'. (this warning may be suppressed after 10 occurrences)
  (util.ellipses_string(value),))
/home/mcgaugheyd/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py:185: SAWarning: Unicode type received non-unicode bind param value 'CCGO_FAM_800160'. (this warning may be suppressed after 10 occurrences)
  (util.ellipses_string(value),))
/home/mcgaugheyd/anaconda2/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py:185: SAWarning: Unicode type received non-unicode bind param value 'CCGO_FAM_800188'. (this warning may be suppressed after 10 occurrences)
  (util.ellipses_string(value),))
Traceback (most recent call last):
  File "/home/mcgaugheyd/git/vcf2db/vcf2db.py", line 810, in <module>
    VCFDB(a.VCF, a.db, a.ped, black_list=a.info_exclude, expand=a.expand, blobber=main_blobber)
  File "/home/mcgaugheyd/git/vcf2db/vcf2db.py", line 216, in __init__
    self.load()
  File "/home/mcgaugheyd/git/vcf2db/vcf2db.py", line 280, in load
    i = self._load(self.cache, create=True, start=1)
  File "/home/mcgaugheyd/git/vcf2db/vcf2db.py", line 273, in _load
    self.insert(variants, expanded, keys, i, create=create)
  File "/home/mcgaugheyd/git/vcf2db/vcf2db.py", line 307, in insert
    if af_val is None or np.isnan(af_val):
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Can't figure out what the issue is. vcf2db proceeds on the vcf PRE vcfanno. I've tried removing my custom annotations for the vcfanno run, but this error keeps happening.

brentp commented 7 years ago

can you add a try around that if statement at 307 and print the af_val for when it fails? That will help me get a quick fix in.

davemcg commented 7 years ago

code mode:

            for col in self.af_cols:
                af_val = variant.get(col)
                try:
                    if af_val is None or np.isnan(af_val):
                        variant[col] = -1.0
                except:
                    print(col)
                    print(af_val)
                    sys.exit()

Output (above stuff largely the same, except of course missing the the TypeError)

aaf_esp_aa_float
0.003663004
davemcg commented 7 years ago

Using vcfanno 0.0.10

brentp commented 7 years ago

what does the header of the vcf show for aaf_esp_aa_float

davemcg commented 7 years ago
##INFO=<ID=aaf_esp_aa_float,Number=1,Type=String,Description="calculated by lua:ratio(vals) of overlapping values in field AA_AC from /data/mcgaugheyd/genomes/1000G_phase2_GRCh37/gemini_annotation/ESP6500SI.all.snps_indels.tidy.v2.vcf.gz">
brentp commented 7 years ago

hmm, should have Type=Float. I'll have a look at vcfanno. Can you show the exact vcfanno.conf for that?

davemcg commented 7 years ago

Yep that's it. Running the latest vcfanno has this:

##INFO=<ID=aaf_esp_aa,Number=1,Type=Float,Description="calculated by lua:ratio(vals) of overlapping values in field AA_AC from /data/mcgaugheyd/genomes/1000G_phase2_GRCh37/gemini_annotation/ESP6500SI.all.snps_indels.tidy.v2.vcf.gz">
davemcg commented 7 years ago

Sorry, switching code over from test system to production system and didn't realize the their vcfanno was so old (0.0.10) instead of (0.1 or 0.1.1)

davemcg commented 7 years ago

Seriously, I don't wake up in the morning and ask myself 'how do I waste Brent's time today?' It just happens. I try to spend at least an hour trying to fix things before heading over to github issues.

brentp commented 7 years ago

no problem. at least we know we fixed a true bug.