Open luciazifcakova opened 3 weeks ago
seems like new version of eukulele container does not have this bug. Can you please start using eukulele:2.0.9--pyhdfd78af_1 singularity container version in your pipeline?
Yes, we're on our way to upgrade Eukulele. There are however a couple of other issues we need to deal with first.
However, we've had no issues with the Perl library you mention and Transdecoder ORFs.
I see. I have tried to run the new version of EUKulele (2.9) and I got same error... Then I made singularty writable container and installed URI::Escape in there, however diamond was still not installed...
I don't know what's going on here, and it's actually more of a Eukulele issue than metatdenovo. However, I recently created a new database with Eukulele using a conda environment, i.e. not Singularity, with Eukulele 2.0.9. I ran create_protein_table.py --infile_peptide eukulele.faa --infile_taxonomy eukulele.tsv
. A colleague then ran metatdenovo pointing to this directory using --eukulele_dbpath /crex/proj/snic2020-16-76/nobackup/data/eukulele/ --eukulele_db gtdb-r220
(gtdb-r220
being the name of the new database; a subdirectory of the eukulele
directory). He's running Nextflow with Singularity.
One reason both Conda environments and Singularity containers may work differently between different computers is that the host operating system provides different tools and versions of tools. If a Conda recipe is not complete, i.e. misses a dependency, it might not fail for me becaue my host operating system provides the tool but fail for you if yours doesn't. These issues are best addressed to the Eukulele developers so they can fix the recipe.
(BTW, we will try to provide the gtdb-r220
database as an option in metatdenovo within a few weeks. Sorry that we can't be quicker here, but it depends on finding a solution for where to store the very large files.)
After I installed URI::Escape in perl on my HPC and made it available for singularty writable containers (EUKulele version 2.5 and 2.9) they were working. I was able to run EUKulele which generated diamond folder with reference.pep.fa.gz.dmnd file in it with both versions of EUKulele. However, neither version created expected files (.out and .csv).
I think the main issue is that pipeline expects to do: gzip user_assembly.prodigal/mets_full/diamond/.out gzip user_assembly.prodigal/taxonomy_counts/.csv gzip user_assembly.prodigal/taxonomy_estimation/*.out
but data are in different format: taxonomy-table.txt tax-table.txt reference.pep.fa.gz.dmnd
The problem is that these processes won't start without successful run of EUKulele: NFCORE_METATDENOVO:METATDENOVO:HMMCLASSIFY:HMMER_HMMSEARCH and NFCORE_METATDENOVO:METATDENOVO:HMMCLASSIFY:HMMRANK
The main issues is that I don't understand how are EUKulele resultant files tied to other processes. Which EUKuelele output files are expected by other processes? How can I change the expected files in nextflow pipeline for my version of pipeline?
After I installed URI::Escape in perl on my HPC and made it available for singularty writable containers (EUKulele version 2.5 and 2.9) they were working. I was able to run EUKulele which generated diamond folder with reference.pep.fa.gz.dmnd file in it with both versions of EUKulele. However, neither version created expected files (.out and .csv).
Please report this on the Eukulele GitHub: https://github.com/AlexanderLabWHOI/EUKulele, so they can fix the packaging.
I think the main issue is that pipeline expects to do: gzip user_assembly.prodigal/metsfull/diamond/.out gzip user_assembly.prodigal/taxonomycounts/.csv gzip user_assembly.prodigal/taxonomy_estimation/*.out
but data are in different format: taxonomy-table.txt tax-table.txt reference.pep.fa.gz.dmnd
I'm as puzzled as you are here -- and I've heard this from other sources, so you're not alone -- since the pipeline works for us.
The problem is that these processes won't start without successful run of EUKulele: NFCORE_METATDENOVO:METATDENOVO:HMMCLASSIFY:HMMER_HMMSEARCH and NFCORE_METATDENOVO:METATDENOVO:HMMCLASSIFY:HMMRANK
The main issues is that I don't understand how are EUKulele resultant files tied to other processes. Which EUKuelele output files are expected by other processes? How can I change the expected files in nextflow pipeline for my version of pipeline?
Neither of those processes actually needs the Eukulele output, but normally nothing starts when one process fails. This can be overridden in a config file like this:
process {
withName: EUKULELE_SEARCH {
errorStrategy: 'ignore'
}
}
Put that in a nextflow.config
file in the directory you're running Nextflow in, or in a file with any name that you add to the nextflow run
command with -c _filename_
.
ignoring the error with default EUKulele container worked well. I have raised issue about EUKulele to EUKulele github repo as well. Thank you for your help.
Description of the bug
I have run Eukulele within the singularity container provided by pipeline and it seems like URI::Escape is missing in its perl installation, which is causing problems with Transcoder.
Command used and terminal output
This is how I run Eukulele from inside the container: srun -p short -t 0-2 -c 20 --mem=100G --pty bash bash-4.4$ singularity shell /flash/MillerU/Vibrio_first_paper_data/work/singularity/depot.galaxyproject.org-singularity-eukulele-2.0.5--pyh723bec7_0.img
EUKulele \ -m mets \ --database gtdb \ --protein_extension .faa \ --reference_dir eukulele \ -o user_assembly.prodigal \ --CPUs 20 \ -s /flash/MillerU/Vibrio_first_paper_data/work/dd/fdbeda7dbd16acff65d2300e339a5d/contigs
and this is error message I received:
2024-11-01 09:04:08 (20.9 MB/s) - ‘TransDecoder-v5.5.0.tar.gz’ saved [15748671/15748671]
Can't locate URI/Escape.pm in @INC (you may need to install the URI::Escape module) (@INC contains: /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib /usr/lib64/perl5/lib /usr/local/lib/site_perl/5.26.2/x86_64-linux-thread-multi /usr/local/lib/site_perl/5.26.2 /usr/local/lib/5.26.2/x86_64-linux-thread-multi /usr/local/lib/5.26.2 .) at /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib/Gene_obj.pm line 15. BEGIN failed--compilation aborted at /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib/Gene_obj.pm line 15. Compilation failed in require at references_bins/TransDecoder/TransDecoder.Predict line 17. BEGIN failed--compilation aborted at references_bins/TransDecoder/TransDecoder.Predict line 17.
Relevant files
This is how I checked for perl URI::Escape:
perl -MURI::Escape -e 'print "URI::Escape is installed\n"'
Can't locate URI/Escape.pm in @INC (you may need to install the URI::Escape module) (@INC contains: /usr/lib64/perl5/lib /usr/local/lib/site_perl/5.26.2/x86_64-linux-thread-multi /usr/local/lib/site_perl/5.26.2 /usr/local/lib/5.26.2/x86_64-linux-thread-multi /usr/local/lib/5.26.2 .). BEGIN failed--compilation aborted.
System information
not running Nextflow, just within provided container HPC singularity/4.1.4 CentOS Linux