metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
374 stars 98 forks source link

Error in rule DRAM_destill #631

Closed mladen5000 closed 1 year ago

mladen5000 commented 1 year ago
2023-04-18 09:33:32,357 - The log file is created at genomes/annotations/dram/distil/distill.log
2023-04-18 09:33:32,464 - Note: the fallowing id fields were not in the annotations file and are not being used: ['kegg_genes_id', 'kegg_id', 'camper_id', 'fegenie_id', 'sulfur_id', 'methyl_id'], but these are ['ko_id', 'kegg_hit', 'peptidase_family', 'cazy_best_hit', 'pfam_hits']
2023-04-18 09:33:32,487 - Retrieved database locations and descriptions
Traceback (most recent call last):
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3652, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'scaffold'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/bin/DRAM.py", line 207, in <module>
    args.func(**args_dict)
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 670, in summarize_genomes
    genome_stats = make_genome_stats(annotations, rrna_frame, trna_frame, groupby_column=groupby_column)
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 252, in make_genome_stats
    row.append('%s (%s, %s)' % (sixteens['scaffold'].iloc[0], sixteens.begin.iloc[0],
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/frame.py", line 3760, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/bladen/databases/conda_envs/f1e29225a050e0f1c884b25918587337_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3654, in get_loc
    raise KeyError(key) from err
KeyError: 'scaffold'

Atlas version 2.15.0 Additional context Add any other context about the problem here.

mladen5000 commented 1 year ago

I was able to fix this issue by running DRAM.py without the rrnas.tsv file.

In atlas 2.13 this worked automatically however either 2.14 or 2.15 I could not reach this step without manually adding intermediate rrnas.tsv files that were empty except for column labels/header.

However once created I reached a pandas key error in dram distil, noticed the environment defaulted to pandas 2.0 so I rolled back to 1.5.1. I also had formatting issues with the aggregate concatenated rrnas.tsv file so I corrected that as well.

Unfortunately neither of these fixed the issue so i omitted the rrna file entirely and still obtained relevant metabolic information.

I think the issue lies in either the concat_annotation rule or the dram distil rule.

mladen5000 commented 1 year ago

Additionally, i see that dram can integrate gtdbtk and checkm results. Is this a feature that could be implemented?

SilasK commented 1 year ago

atlas generates checkm2 and gtdb it just not gives it to the dram to create the report.

luozhy88 commented 1 year ago

I have same bug!