metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

Error in rule get_all_modules #655

Closed mladen5000 closed 1 year ago

mladen5000 commented 1 year ago

Here is the relevant log output:

2023-05-13 04:39:11 Uncaught exception: Traceback (most recent call last):
  File "/projects/com_perkinsd/common/qc-antibiotics-atlas/.snakemake/scripts/tmp5iy1gxw3.DRAM_get_all_modules.py", line 58, in <module>
    module_steps_form = pd.read_csv(
  File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
    self.handles = get_handle(
  File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/common.py", line 713, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "/projects/com_perkinsd/common/databases/conda_envs/a779e7ab5b5ee88b6a071a9705d2d44a_/lib/python3.10/site-packages/pandas/io/common.py", line 451, in _get_filepath_or_buffer
    raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'NoneType'>

Atlas version

Additional context Add any other context about the problem here.

mladen5000 commented 1 year ago

For context, I ran the pipeline on a completely unrelated dataset and got the same errors, as well as issue #653 #654.

SilasK commented 1 year ago

It is probable that I have an error in my code. Could you please run atlas on the test data. It worked on my side. https://zenodo.org/record/3992790/files/test_reads.tar.gz

SilasK commented 1 year ago

Could you also check your genomes/annotations/dram/annotations.tsv

mladen5000 commented 1 year ago
Here is the head of `genomes/annotations/dram/annotations.tsv`:

    gene_position   rank    strandedness    end_position    start_position  fasta   scaffold    heme_regulatory_motif_count
MAG18_MAG18_1_1 1   E   1   204 1   MAG18   MAG18_1 0
MAG18_MAG18_1_2 2   E   1   902 207 MAG18   MAG18_1 0
MAG18_MAG18_1_3 3   E   -1  3135    1042    MAG18   MAG18_1 0
MAG18_MAG18_1_4 4   E   -1  3659    3393    MAG18   MAG18_1 0
MAG18_MAG18_1_5 5   E   -1  3999    3811    MAG18   MAG18_1 0
MAG18_MAG18_10_1    1   E   1   659 30  MAG18   MAG18_10    0
MAG18_MAG18_10_2    2   E   1   1319    885 MAG18   MAG18_10    0
MAG18_MAG18_10_3    3   E   1   1996    1316    MAG18   MAG18_10    0
MAG18_MAG18_10_4    4   E   1   3796    2396    MAG18   MAG18_10    0
mladen5000 commented 1 year ago

I received the same DRAM errors as before. However, no errors on the genecatalog side of things. Im going to attempt to re-download the dram database

johnne commented 1 year ago

I'm seeing this on atlas v2.15.0 also. I think it may be related to dram not getting the dram configuration file which specifies all the resources required. I tried setting DRAM_CONFIG_LOCATION to DRAM/DRAM.config under the database_dir set in my atlas config file and that bypassed the first error reported here (ValueError: Invalid file path or buffer object type: <class 'NoneType'>).

Now I instead run into

2023-05-21 08:23:55 Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.2023-05-21 08:24:19 Uncaught exception: Traceback (most recent call last):  File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3652, in get_loc    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ko_id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):  File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/atlas/.snakemake/scripts/tmpfnhn5496.DRAM_get_all_modules.py", line 67, in <module>    module_coverage_frame = make_module_coverage_frame(  File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 340, in make_module_coverage_frame    module_coverage_dict[group] = make_module_coverage_df(frame, module_nets)  File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/mag_annotator/summarize_genomes.py", line 319, in make_module_coverage_df    for gene_id, ko_list in annotation_df[ko_id_name].items():  File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/pandas/core/frame.py", line 3761, in __getitem__    indexer = self.columns.get_loc(key)  File "/crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/9f41e817c598c12d8afe52ac2a7750e1_/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3654, in get_loc    raise KeyError(key) from err
KeyError: 'ko_id'
johnne commented 1 year ago

I think DRAM expects a kegg_id or ko_id column in the annotation file. The dram log has No KEGG source provided so distillation will be of limited use. so I guess the missing dram config environment variable was causing issues upstream of get_all_modules. I'm trying to rerun the dram annotation steps to see if I get kegg ids included in the output.

johnne commented 1 year ago

Passing the configuration file to the DRAM_annotate, DRAM_destill and get_all_modules rules fixes the issue for me.

johnne commented 1 year ago

See PR #658

yotsa commented 1 year ago

I ran into this today. How do I pass the config file directly to those rules?

SilasK commented 1 year ago

I merged @johnne pull request. @yotsa if you simply install atlas from the github the problem should be fixed.

I will test it before making a conda release.

github-actions[bot] commented 1 year ago

There was no activity since some time. I hope your issue is solved in the mean time. This issue will automatically close soon if no further activity occurs.

Thank you for your contributions.