Open innovate-invent opened 3 years ago
@apetkau you had asked for more information about the issue I am having with this pipeline.
I am getting the following error:
2021-06-17 19:19:12,395 INFO: Grouped 2 fastqs into 1 groups [in /usr/local/lib/python3.7/site-packages/refseq_masher/utils.py:174]
2021-06-17 19:19:12,395 INFO: Collected 0 FASTA inputs and 1 read sets [in /usr/local/lib/python3.7/site-packages/refseq_masher/utils.py:185]
2021-06-17 19:19:12,395 INFO: Running Mash Screen with NCBI RefSeq sketch database against sample "SRR3028776" with inputs: ['SRR3028776_1.fastq', 'SRR3028776_2.fastq'] [in /usr/local/lib/python3.7/site-packages/refseq_masher/mash/screen.py:44]
Loading /usr/local/lib/python3.7/site-packages/refseq_masher/data/RefSeqSketches.msh...
Traceback (most recent call last):
File "/usr/local/bin/refseq_masher", line 10, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/refseq_masher/cli.py", line 136, in contains
parallelism=parallelism)
File "/usr/local/lib/python3.7/site-packages/refseq_masher/mash/screen.py", line 46, in vs_refseq
df = mash_screen_output_to_dataframe(stdout)
File "/usr/local/lib/python3.7/site-packages/refseq_masher/mash/parser.py", line 117, in mash_screen_output_to_dataframe
df = pd.read_table(StringIO(mash_out))
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1917, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Thanks @innovate-invent . I am still not sure why this is causing an issue since it does work for me.
However, I do notice it referring to Python in /usr/local/lib/python3.7
. Is it using a system Python? Or are you using a singularity container? That could be one place to look into since I am using a conda environment to install the software.
I am running it in docker using https://quay.io/repository/biocontainers/refseq_masher
I think it is this issue: https://github.com/phac-nml/refseq_masher/issues/2
Looks like the Galaxy wrapper points at 0.1.1 but 0.1.2 is available.
Okay, thanks.
Do you know which Docker container tag you are using (https://quay.io/repository/biocontainers/refseq_masher?tab=tags)? I suspect it's 0.1.1--py_2
since that's the tag where I see it using Python 3.7.
Also, what do the input fastq files for SRR3028776
look like? As in, are they very small datasets for testing, or are they full-sized fastq files? The error you are getting is EmptyDataError
, so it may just be that the fastq files you are testing are too small and produce no results.
quay.io/biocontainers/refseq_masher:0.1.1--py_2
The input fastq are 200MB each and were used to test earlier versions of IRIDA.
Look like the issue was that the tool wrapper version was bumped but not the package version: https://github.com/phac-nml/galaxy_tools/pull/213
What needs changed?
RefSeqMasher Pipeline needs to be pulled out into a plugin. The workflow needs to be imported and then re-exported from Galaxy 21.01 or later as it currently does not function with recent versions.
https://github.com/phac-nml/irida/tree/development/src/main/resources/ca/corefacility/bioinformatics/irida/model/workflow/analysis/type/workflows/RefSeqMasherOnPairedReads/0.1