nick-youngblut / gtdb_to_taxdump

Convert GTDB taxonomy to NCBI taxdump format
MIT License
65 stars 13 forks source link

KeyError with `acc2gtdb_tax.py` #17

Closed Sidduppal closed 2 years ago

Sidduppal commented 2 years ago

Getting the following KeyError while running acc2gtdb_tax.py. It runs for a dew seconds goes through some genomes and then crashes.

/release207_v2/fastani/database/GCF/016/827/605/GCF_016827605.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/008/875/GCF_016008875.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/862/815/GCF_016862815.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/862/955/GCF_016862955.1_genomic.fna.gz
/fastani/database/GCF/016/862/095/GCF_016862095.1_genomic.fna.gz
/fastani/database/GCF/016/862/635/GCF_016862635.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/725/325/GCF_016725325.1_genomic.fna.gz
/release207_v2/fastani/database/GCF/016/464/385/GCF_016464385.1_genomic.fna.gz
Traceback (most recent call last):
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/acc2gtdb_tax.py", line 140, in <module>
    main(args)
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/acc2gtdb_tax.py", line 131, in main
    thread_map(acc2tax_partial, gtdb_genomes, chunksize=1, max_workers=args.threads)
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 94, in thread_map
    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/acc2gtdb_tax.py", line 94, in seq_acc2tax
    acc_prefix = acc_code[splitpath[-4]]
KeyError: '012'

I'm using the latest gtdb database (release207_v2) with the following commands: acc2gtdb_tax.py release207_v2/fastani/database gtdb_to_taxdump/names.dmp --threads 20 --outfile gtdb_to_taxdump/gtdb.acc2tax

nick-youngblut commented 2 years ago

I didn't write that code, but I believe that @maxibor can help you out (https://github.com/nick-youngblut/gtdb_to_taxdump/pull/12)

maxibor commented 2 years ago

19 should fix it @Sidduppal and @nick-youngblut