nf-core / metatdenovo

Assembly and annotation of metatranscriptomic or metagenomic data for prokaryotic, eukaryotic and viruses.
https://nf-co.re/metatdenovo
MIT License
22 stars 10 forks source link

URI::Escape is missing in its perl installation inside Eukulele container #303

Open luciazifcakova opened 3 weeks ago

luciazifcakova commented 3 weeks ago

Description of the bug

I have run Eukulele within the singularity container provided by pipeline and it seems like URI::Escape is missing in its perl installation, which is causing problems with Transcoder.

Command used and terminal output

This is how I run Eukulele from inside the container: srun -p short -t 0-2 -c 20 --mem=100G --pty bash bash-4.4$ singularity shell /flash/MillerU/Vibrio_first_paper_data/work/singularity/depot.galaxyproject.org-singularity-eukulele-2.0.5--pyh723bec7_0.img

EUKulele \ -m mets \ --database gtdb \ --protein_extension .faa \ --reference_dir eukulele \ -o user_assembly.prodigal \ --CPUs 20 \ -s /flash/MillerU/Vibrio_first_paper_data/work/dd/fdbeda7dbd16acff65d2300e339a5d/contigs

and this is error message I received:

2024-11-01 09:04:08 (20.9 MB/s) - ‘TransDecoder-v5.5.0.tar.gz’ saved [15748671/15748671]

Can't locate URI/Escape.pm in @INC (you may need to install the URI::Escape module) (@INC contains: /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib /usr/lib64/perl5/lib /usr/local/lib/site_perl/5.26.2/x86_64-linux-thread-multi /usr/local/lib/site_perl/5.26.2 /usr/local/lib/5.26.2/x86_64-linux-thread-multi /usr/local/lib/5.26.2 .) at /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib/Gene_obj.pm line 15. BEGIN failed--compilation aborted at /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib/Gene_obj.pm line 15. Compilation failed in require at references_bins/TransDecoder/TransDecoder.Predict line 17. BEGIN failed--compilation aborted at references_bins/TransDecoder/TransDecoder.Predict line 17.

Relevant files

This is how I checked for perl URI::Escape:

perl -MURI::Escape -e 'print "URI::Escape is installed\n"'

Can't locate URI/Escape.pm in @INC (you may need to install the URI::Escape module) (@INC contains: /usr/lib64/perl5/lib /usr/local/lib/site_perl/5.26.2/x86_64-linux-thread-multi /usr/local/lib/site_perl/5.26.2 /usr/local/lib/5.26.2/x86_64-linux-thread-multi /usr/local/lib/5.26.2 .). BEGIN failed--compilation aborted.

System information

not running Nextflow, just within provided container HPC singularity/4.1.4 CentOS Linux

luciazifcakova commented 3 weeks ago

seems like new version of eukulele container does not have this bug. Can you please start using eukulele:2.0.9--pyhdfd78af_1 singularity container version in your pipeline?

erikrikarddaniel commented 3 weeks ago

Yes, we're on our way to upgrade Eukulele. There are however a couple of other issues we need to deal with first.

However, we've had no issues with the Perl library you mention and Transdecoder ORFs.

luciazifcakova commented 2 weeks ago

I see. I have tried to run the new version of EUKulele (2.9) and I got same error... Then I made singularty writable container and installed URI::Escape in there, however diamond was still not installed...

erikrikarddaniel commented 2 weeks ago

I don't know what's going on here, and it's actually more of a Eukulele issue than metatdenovo. However, I recently created a new database with Eukulele using a conda environment, i.e. not Singularity, with Eukulele 2.0.9. I ran create_protein_table.py --infile_peptide eukulele.faa --infile_taxonomy eukulele.tsv. A colleague then ran metatdenovo pointing to this directory using --eukulele_dbpath /crex/proj/snic2020-16-76/nobackup/data/eukulele/ --eukulele_db gtdb-r220 (gtdb-r220 being the name of the new database; a subdirectory of the eukulele directory). He's running Nextflow with Singularity.

One reason both Conda environments and Singularity containers may work differently between different computers is that the host operating system provides different tools and versions of tools. If a Conda recipe is not complete, i.e. misses a dependency, it might not fail for me becaue my host operating system provides the tool but fail for you if yours doesn't. These issues are best addressed to the Eukulele developers so they can fix the recipe.

(BTW, we will try to provide the gtdb-r220 database as an option in metatdenovo within a few weeks. Sorry that we can't be quicker here, but it depends on finding a solution for where to store the very large files.)

luciazifcakova commented 2 weeks ago

After I installed URI::Escape in perl on my HPC and made it available for singularty writable containers (EUKulele version 2.5 and 2.9) they were working. I was able to run EUKulele which generated diamond folder with reference.pep.fa.gz.dmnd file in it with both versions of EUKulele. However, neither version created expected files (.out and .csv).

I think the main issue is that pipeline expects to do: gzip user_assembly.prodigal/mets_full/diamond/.out gzip user_assembly.prodigal/taxonomy_counts/.csv gzip user_assembly.prodigal/taxonomy_estimation/*.out

but data are in different format: taxonomy-table.txt tax-table.txt reference.pep.fa.gz.dmnd

The problem is that these processes won't start without successful run of EUKulele: NFCORE_METATDENOVO:METATDENOVO:HMMCLASSIFY:HMMER_HMMSEARCH and NFCORE_METATDENOVO:METATDENOVO:HMMCLASSIFY:HMMRANK

The main issues is that I don't understand how are EUKulele resultant files tied to other processes. Which EUKuelele output files are expected by other processes? How can I change the expected files in nextflow pipeline for my version of pipeline?

erikrikarddaniel commented 2 weeks ago

After I installed URI::Escape in perl on my HPC and made it available for singularty writable containers (EUKulele version 2.5 and 2.9) they were working. I was able to run EUKulele which generated diamond folder with reference.pep.fa.gz.dmnd file in it with both versions of EUKulele. However, neither version created expected files (.out and .csv).

Please report this on the Eukulele GitHub: https://github.com/AlexanderLabWHOI/EUKulele, so they can fix the packaging.

I think the main issue is that pipeline expects to do: gzip user_assembly.prodigal/metsfull/diamond/.out gzip user_assembly.prodigal/taxonomycounts/.csv gzip user_assembly.prodigal/taxonomy_estimation/*.out

but data are in different format: taxonomy-table.txt tax-table.txt reference.pep.fa.gz.dmnd

I'm as puzzled as you are here -- and I've heard this from other sources, so you're not alone -- since the pipeline works for us.

The problem is that these processes won't start without successful run of EUKulele: NFCORE_METATDENOVO:METATDENOVO:HMMCLASSIFY:HMMER_HMMSEARCH and NFCORE_METATDENOVO:METATDENOVO:HMMCLASSIFY:HMMRANK

The main issues is that I don't understand how are EUKulele resultant files tied to other processes. Which EUKuelele output files are expected by other processes? How can I change the expected files in nextflow pipeline for my version of pipeline?

Neither of those processes actually needs the Eukulele output, but normally nothing starts when one process fails. This can be overridden in a config file like this:

process {
  withName: EUKULELE_SEARCH {
    errorStrategy: 'ignore'
  }
}

Put that in a nextflow.config file in the directory you're running Nextflow in, or in a file with any name that you add to the nextflow run command with -c _filename_.

luciazifcakova commented 2 weeks ago

ignoring the error with default EUKulele container worked well. I have raised issue about EUKulele to EUKulele github repo as well. Thank you for your help.