hakyimlab / summary-gwas-imputation

harmonization, liftover, and imputation of summary statistics from GWAS
MIT License
31 stars 20 forks source link

ValueError: invalid literal for int() with base 10: '1_KI270766v1_alt' #6

Closed carbocation closed 4 years ago

carbocation commented 4 years ago

I am trying to harmonize a UK Biobank GWAS (on HG19) for use with SPredixcan. However, after the liftover step, gwas_parsing.py is crashing because some of my variants have been lifted over to alternative chromosomal assemblies. It's not clear to me a priori how I can know which ones will be problematic (unlike X chromosome variants). Any pointers on how to overcome this?

INFO - Parsing input GWAS
INFO - loaded 19400443 variants
INFO - Performing liftover
INFO - 19400443 variants after liftover
Traceback (most recent call last):
  File "/mnt/storage/bioinformatics/summary-gwas-imputation/src/gwas_parsing.py", line 311, in <module>
    run(args)
  File "/mnt/storage/bioinformatics/summary-gwas-imputation/src/gwas_parsing.py", line 283, in run
    d = clean_up(d)
  File "/mnt/storage/bioinformatics/summary-gwas-imputation/src/gwas_parsing.py", line 245, in clean_up
    d = Genomics.sort(d)
  File "/mnt/storage/bioinformatics/summary-gwas-imputation/src/genomic_tools_lib/miscellaneous/Genomics.py", line 94, in sort
    chr = [int(x.split("chr")[1]) if "chr" in x else None for x in d.chromosome]
  File "/mnt/storage/bioinformatics/summary-gwas-imputation/src/genomic_tools_lib/miscellaneous/Genomics.py", line 94, in <listcomp>
    chr = [int(x.split("chr")[1]) if "chr" in x else None for x in d.chromosome]
ValueError: invalid literal for int() with base 10: '1_KI270766v1_alt'
Heroico commented 4 years ago

Good day,

I fixed the offending code so that it doesn't crash on alternative assemblies, that error should no longer manifest. However, some other part of the code might fail with this alternative chromosomes, please let me know if you run into any issues.

Best,

Alvaro