phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
124 stars 33 forks source link

MOB-recon v3.1.7 fails at the biomarker stage. #152

Closed ejurga closed 11 months ago

ejurga commented 1 year ago

MOB-recon v3.1.7 fails when running create_biomarker_dataframe. Previous versions (tested on v3.1.4) are working on the same set of contigs.

Full stack trace:

Traceback (most recent call last):
  File "/home/emil/Projects/mobsuite-rgi/work/conda/mob_suite_3.1.7/bin/mob_recon", line 10, in <module>
    sys.exit(main())
  File "/home/emil/Projects/mobsuite-rgi/work/conda/mob_suite_3.1.7/lib/python3.8/site-packages/mob_suite/mob_recon.py", line 1472, in main
    biomarker_df = create_biomarker_dataframe(biomarker_params,id_mapping,logging)
  File "/home/emil/Projects/mobsuite-rgi/work/conda/mob_suite_3.1.7/lib/python3.8/site-packages/mob_suite/utils.py", line 1397, in create_biomarker_dataframe
    return pd.concat(data_frames)
  File "/home/emil/Projects/mobsuite-rgi/work/conda/mob_suite_3.1.7/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 271, in concat
    op = _Concatenator(
  File "/home/emil/Projects/mobsuite-rgi/work/conda/mob_suite_3.1.7/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 329, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

This is using mamba to create an environment.

Command run:

DB=/path/to/mob/database/directory
mob_recon \
    --infile contigs.fasta \
    --outdir mobSuite \
    --num_threads 1 \
    --database_directory $DB \
    --force 

I have attached the contigs I used. contigs.fasta.zip

kbessonov1984 commented 1 year ago

Hi, I will need to look at this issue and it has to do with empty list of dataframes that need to be concatenated to generate a master table of plasmid features with respective coordinates. Effectively you are getting pandas error similar to pd.concat([]). This means that your sample has most likely no detected plasmid features or plasmids at all (had no time to test your inputs yet). This biomarker_dataframe was a new feature introduced only in version 3.1.7. Will address this edge case in the next release and push a patch meanwhile that would be available via source code installation. Since you already have all dependencies installed, then install from source code will be very painless.

ejurga commented 1 year ago

Sounds good! I'll use the earlier versions in the meantime.

I'll leave the issue open for now in case others run into the same problem.

kbessonov1984 commented 11 months ago

fixed in v3.1.8 release