vatlab / varianttools

software tool for the manipulation, annotation, selection, and analysis of variants in the context of next-gen sequencing analysis
https://vatlab.github.io/vat-docs/
GNU General Public License v3.0
31 stars 4 forks source link

ERROR: 'ascii' codec can't encode characters in position 115-116: ordinal not in range(128) #84

Closed freedomq8 closed 4 years ago

freedomq8 commented 6 years ago

Hi there, I am trying to add a customized database ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20180729.vcf.gz . I created the .ann file clinvar-20180729.txt

and used it to import source file locally however I always get this msg ERROR: 'ascii' codec can't encode characters in position 115-116: ordinal not in range(128) when trying to annotate my variants.

e.g vtools output variant s clinvar.chr …. etc > clinvar_Variants.vcf

I changed the encoding of the ann file by adding encoding=ISO-8859-1 to the beginning of the source file. repeated the importing of clinvar vcf and tried to annotate my file but the same error.

tried to change encoding of the original vcf file as well along with ann file using notepad++ but the message persist.

Any idea how to solve this

BoPeng commented 6 years ago

Ohmm, adding that line is supposed to solve the problem. Let me try.

BoPeng commented 6 years ago

I cannot reproduce your problem. What I did was

  1. vtools init test
  2. wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20180729.vcf.gz
  3. vtools use clinvar-20180729.ann --files clinvar_20180729.vcf.gz
  4. gzcat clinvar_20180729.vcf.gz | head -2000 > data.vcf
  5. vtools import data.vcf
  6. vtools output variant chr pos ref alt clinvar.chr CLNDN

What version of vtools are you using? On which OS?

freedomq8 commented 6 years ago

my OS is ubuntu 17.0

can you output the following

vtools output variant variant.chr variant.pos variant.ref variant.alt  variant.region_type  variant.region_name  variant.mut_type  variant.function  clinvar.chr  clinvar.pos  clinvar.name  clinva
r.ref  clinvar.alt  clinvar.qual  clinvar.filter  clinvar.RS  clinvar.AF_ESP  clinvar.AF_EXAC  clinvar.AF_TGP  cl
invar.ALLELEID  clinvar.CLNDN  clinvar.CLNDNINCL  clinvar.CLNDISDB  clinvar.CLNDISDBINCL  clinvar.CLNHGVS  clinva
r.CLNREVSTAT  clinvar.CLNSIG  clinvar.CLNSIGCONF  clinvar.CLNSIGINCL  clinvar.CLNVC  clinvar.CLNVCSO  clinvar.CLN
VI  clinvar.DBVARID  clinvar.GENEINFO  clinvar.MC  clinvar.ORIGIN  clinvar.SSR > annotation.clinvar.26682.vcf
vtools output variant  variant.chr  variant.pos  variant.ref  variant.alt  variant.region_type  variant.region_name  variant.mut_type  variant.function  clinvar.chr  clinvar.pos  clinvar.name  clinvar.ref  clinvar.alt  clinvar.qual  clinvar.filter  clinvar.RS  clinvar.AF_ESP  clinvar.AF_EXAC  clinvar.AF_TGP  clinvar.ALLELEID  clinvar.CLNDN  clinvar.CLNDNINCL  clinvar.CLNDISDB  clinvar.CLNDISDBINCL  clinvar.CLNHGVS  clinvar.CLNREVSTAT  clinvar.CLNSIG  clinvar.CLNSIGCONF  clinvar.CLNSIGINCL  clinvar.CLNVC  clinvar.CLNVCSO  clinvar.CLNVI  clinvar.DBVARID  clinvar.GENEINFO  clinvar.MC  clinvar.ORIGIN  clinvar.SSR > annotation.clinvar.26682.vcf

ERROR: 'ascii' codec can't encode characters in position 115-116: ordinal not in range(128)

I thought I figure it out re-do my ann file but still same issue when I query the above fields. its clinvar.CLNDN and clinvar.CLNVI which are fields from other clinvar database

BoPeng commented 6 years ago

I cannot test now so you meant that the command would run without clinvar.CLNDN and clinvar.CLNVI?

BoPeng commented 4 years ago

Yes, again, I cannot reproduce on mac so I am closing the ticket. Note that I tried again with the .ann file adapted to the new 2020 version from ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20200506.vcf.gz.

clinvar_20200506.txt