phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
124 stars 33 forks source link

mob_cluster: "pandas.errors.ParserError: Too many columns specified: expected 26 and found 24" #103

Closed cwillian1 closed 2 years ago

cwillian1 commented 2 years ago

Hi. First thanks for this great tool. So far I'm trying to understand how to use mob-suite and I've been successfully running the examples already present in here, but I'm having a few issues to understand how to create and update the database. What I've done is run the basic mode by: "mob_recon --infile mob_suite/example/SRR3703080_illumina_unicycler.fasta --outdir SRR3703080_basicmode"

from that I tried to use the output as an input to test the update and create database by running:

"cat SRR3703080_basicmode/plasmid*.fasta > SRR3703080_basicmode/new_plasmids.fasta" "mob_cluster --mode build -f SRR3703080_basicmode/new_plasmids.fasta -p SRR3703080_basicmode/contig_report.txt -t SRR3703080_basicmode/mobtyper_results.txt --outdir SRR3703080_db"

And I getting the following error: " *2022-03-14 20:12:20,549 root INFO: SUCCESS: Found program tblastn at /home/cwillian/miniconda3/envs/mob-suite/bin/tblastn [in /home/cwillian/miniconda3/envs/mob-suite/lib/python3.8/site-packages/mob_suite/utils.py:590] Traceback (most recent call last): File "/home/cwillian/miniconda3/envs/mob-suite/bin/mob_cluster", line 33, in sys.exit(load_entry_point('mob-suite==3.0.3', 'console_scripts', 'mob_cluster')()) File "/home/cwillian/miniconda3/envs/mob-suite/lib/python3.8/site-packages/mob_suite/mob_cluster.py", line 497, in main records = read_file_to_dict(mob_typer_report_file, MOB_TYPER_REPORT_HEADER, separater="\t") File "/home/cwillian/miniconda3/envs/mob-suite/lib/python3.8/site-packages/mob_suite/utils.py", line 484, in read_file_to_dict data = pd.read_csv(file, sep=separater, header=0, names=header, encoding="UTF-8") File "/home/cwillian/miniconda3/envs/mob-suite/lib/python3.8/site-packages/pandas/io/parsers.py", line 676, in parser_f return _read(filepath_or_buffer, kwds) File "/home/cwillian/miniconda3/envs/mob-suite/lib/python3.8/site-packages/pandas/io/parsers.py", line 454, in _read data = parser.read(nrows) File "/home/cwillian/miniconda3/envs/mob-suite/lib/python3.8/site-packages/pandas/io/parsers.py", line 1133, in read ret = self._engine.read(nrows) File "/home/cwillian/miniconda3/envs/mob-suite/lib/python3.8/site-packages/pandas/io/parsers.py", line 2037, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 952, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 1013, in pandas._libs.parsers.TextReader._convert_column_data pandas.errors.ParserError: Too many columns specified: expected 26 and found 24 "

So you can see that I'm using the "mobtyper_results.txt" as the taxonomy file.I'm assuming that this right file and I'm getting this error. What could be the problem? I'm not sure if it is the right file, but, if it isn't, could you tell me where I would be able to obtain this taxonomy file ?

Thanks in advance for all the help! =)

jrober84 commented 2 years ago

So the mob-recon report is not suitable for use with MOB-cluster. We definitely need to improve the documentation for this process. But with the example command for MOB-cluster: mob_cluster --mode build -f new_plasmids.fasta -p new_plasmids_mobtyper_report.txt -t new_plasmids_host_taxonomy.txt --outdir output_directory. You would need to provide your MOB-typer report to -p and you would need to provide a taxonomy file with the fields ['sample_id', 'organism']. You need to generate this file yourself.

cwillian1 commented 2 years ago

Thanks for your reply! I was able to create the database after your response. I just had to change the header from the fasta sequence of the plasmids to match with the first column 'sample_id' of the "mobtyper_results" file and the 'sample_id' column from the taxonomy_file created. I assuming that I'm the right direction by doing that.

Here's an example of the taxonomy file created.

taxonomy_file.txt