terrimporter / MetaWorks

MetaWorks is a flexible multi-marker metabarcode pipeline for processing paired-end Illumina reads from raw fastq.gz files to taxonomic assignments.
https://terrimporter.github.io/MetaWorksSite/
GNU General Public License v3.0
17 stars 4 forks source link

Issues in MetaWorks 12.0 Tutorial #9

Closed pedroaslongo closed 8 months ago

pedroaslongo commented 10 months ago

Hello, everyone!

I am am having trouble with the Tutorial, apperently some classifier-related issue. I have edited the path to the classifier in the config_testing_COI_data.yaml file, I also tried to increase the memory allocated to the rdp classifier in the config file, but it is still not working. I am getting the following messages:

/home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/bin/rdp_classifier: line 57: 8489 Killed /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/bin/java -Xmx8g -jar /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/share/rdp_classifier-2.13-1/classifier.jar classify -t mydata/rRNAClassifier.properties -o COI/rdp.out.tmp COI/cat.denoised.nonchimeras

[Thu Aug 17 12:03:34 2023] Error in rule taxonomic_assignment: jobid: 0 input: COI/cat.denoised.nonchimeras output: COI/rdp.out.tmp

RuleException: CalledProcessError in file /home/labecmar/MetaWorks1.12.0/snakefile_ESV, line 343: Command 'set -euo pipefail; rdp_classifier -Xmx8g classify -t mydata/rRNAClassifier.properties -o COI/rdp.out.tmp COI/cat.denoised.nonchimeras' returned non-zero exit status 137. File "/home/labecmar/MetaWorks1.12.0/snakefile_ESV", line 343, in __rule_taxonomic_assignment File "/home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/lib/python3.10/concurrent/futures/thread.py", line 58, in run Removing output files of failed job taxonomic_assignment since they might be corrupted: COI/rdp.out.tmp Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-08-17T120233.625360.snakemake.log

Could you please help me solve this issue? Thank you very much!

terrimporter commented 10 months ago

Can you reset the memory allotted to the RDP classifier in the config file to -Xmx16g? If it still fails, then can you please post the updated error messages?

pedroaslongo commented 10 months ago

Can you reset the memory allotted to the RDP classifier in the config file to -Xmx16g? If it still fails, then can you please post the updated error messages?

I got the same error message:

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 2 Rules claiming more threads will be scaled down. Select jobs to execute... /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/bin/rdp_classifier: line 57: 1231 Killed /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/bin/java -Xmx16g -jar /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/share/rdp_classifier-2.13-1/classifier.jar classify -t mydata/rRNAClassifier.properties -o COI/rdp.out.tmp COI/cat.denoised.nonchimeras [Fri Aug 18 10:39:14 2023] Error in rule taxonomic_assignment: jobid: 0 input: COI/cat.denoised.nonchimeras output: COI/rdp.out.tmp

RuleException: CalledProcessError in file /home/labecmar/MetaWorks1.12.0/snakefile_ESV, line 343: Command 'set -euo pipefail; rdp_classifier -Xmx16g classify -t mydata/rRNAClassifier.properties -o COI/rdp.out.tmp COI/cat.denoised.nonchimeras' returned non-zero exit status 137. File "/home/labecmar/MetaWorks1.12.0/snakefile_ESV", line 343, in __rule_taxonomic_assignment File "/home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/lib/python3.10/concurrent/futures/thread.py", line 58, in run Removing output files of failed job taxonomic_assignment since they might be corrupted: COI/rdp.out.tmp Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-08-18T103721.878497.snakemake.log

terrimporter commented 10 months ago

Can you double check the path to rRNAClassifier.properties in the config file to make sure it's a complete path (including /home/labecmar etc. as needed)? If this doesn't help, then please post the updated errors, thanks.

pedroaslongo commented 10 months ago

Can you double check the path to rRNAClassifier.properties in the config file to make sure it's a complete path (including /home/labecmar etc. as needed)? If this doesn't help, then please post the updated errors, thanks.

I did it now using the whole path to rRNAClassifier.properties, but I still had the same message.

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 2 Rules claiming more threads will be scaled down. Select jobs to execute... /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/bin/rdp_classifier: line 57: 1757 Killed /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/bin/java -Xmx16g -jar /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/share/rdp_classifier-2.13-1/classifier.jar classify -t /home/labecmar/MetaWorks1.12.0/mydata/rRNAClassifier.properties -o COI/rdp.out.tmp COI/cat.denoised.nonchimeras [Fri Aug 18 12:43:53 2023] Error in rule taxonomic_assignment: jobid: 0 input: COI/cat.denoised.nonchimeras output: COI/rdp.out.tmp

RuleException: CalledProcessError in file /home/labecmar/MetaWorks1.12.0/snakefile_ESV, line 343: Command 'set -euo pipefail; rdp_classifier -Xmx16g classify -t /home/labecmar/MetaWorks1.12.0/mydata/rRNAClassifier.properties -o COI/rdp.out.tmp COI/cat.denoised.nonchimeras' returned non-zero exit status 137. File "/home/labecmar/MetaWorks1.12.0/snakefile_ESV", line 343, in __rule_taxonomic_assignment File "/home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/lib/python3.10/concurrent/futures/thread.py", line 58, in run Removing output files of failed job taxonomic_assignment since they might be corrupted: COI/rdp.out.tmp Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-08-18T124244.919662.snakemake.log

terrimporter commented 10 months ago
  1. Can you double check that you have activated the MetaWorks_v1.12.0 environment file?
  2. Can you run the rdp_classifier from the command line to ensure that the program is properly starting? You can paste /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/share/rdp_classifier-2.13-1/classifier.jar or try just using rdp_classifier to see if it calls the program.
  3. Can you check that the COI/cat.denoised.nonchimeras file has content?
pedroaslongo commented 10 months ago
  • Can you double check that you have activated the MetaWorks_v1.12.0 environment file?
  • Can you run the rdp_classifier from the command line to ensure that the program is properly starting? You can paste /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/share/rdp_classifier-2.13-1/classifier.jar or try just using rdp_classifier to see if it calls the program.
  • Can you check that the COI/cat.denoised.nonchimeras file has content?
  1. The environment file is activates.

  2. If I paste /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/share/rdp_classifier-2.13-1/classifier.jar , I have a permission denied message -bash: /home/labecmar/miniconda3/envs/MetaWorks_v1.12.0/share/rdp_classifier-2.13-1/classifier.jar: Permission denied

If I paste only rdp_classifier, i got this: USAGE: ClassifierMain <subcommand args ...> default command is classify classify - classify one or multiple samples crossvalidate - cross validate accuracy testing comp-trainset - compare multiple training sets to find shared and unique taxa and sequences libcompare - compare two samples loot - leave one (sequence or taxon) out accuracy testing merge-detail - merge classification detail result files to create a taxon assignment counts file merge-count - merge multiple taxon assignment count files to into one count file random-sample - random select a subset or subregion of sequences rm-dupseq - remove identical or any sequence contained by another sequence rm-partialseq - remove partial sequences taxa-sim - calculate and plot the similarities within taxa train - retrain classifier version - taxonomy versions of the pre-compiled training sets

  1. Yes, the COI/cat.denoised.nonchimeras file has contents.
gjdury commented 10 months ago

As far as I can tell, I am having the same issue. Similarly to pedroaslongo, I made sure the MetaWorks_v1.12.0 environment file is active. I made the COI/cat.denoised.nonchimeras file has content, and I tested config_testing_COI_data.yaml with both an absolute path and a relative path. In both cases I got what I think is same error message as pedroaslongo.

(I am relatively new to all this, so I may have missed something that means it's a different error. I would be happy to also post the error message I'm getting if that would be useful.)

terrimporter commented 10 months ago

First, double check that you've increased the memory to the classifier to -Xmx16g and are using the full/absolute path to your rRNAClassifier.properties file.

Yes, please post the error so I can take a look.

terrimporter commented 10 months ago

@pedroaslongo did you get it working?

gjdury commented 10 months ago

My apologies, I hadn't read the beginning of the thread closely and had missed the suggestion to increase the the memory to the classifier to -Xmx16g. After doing that, it looks like the RDPclassifier step completed successfully. I did get a different error, but I don't believe it's an issue with MetaWorks, rather it seems like an issue with my ORFfinder installation or how I added it to my path.

    input: COI/chimera.denoised.nonchimeras.taxon
    output: COI/orfs.fasta.nt
    jobid: 34
    reason: Missing output files: COI/orfs.fasta.nt
    resources: tmpdir=/tmp

/usr/bin/bash: ORFfinder: command not found
[Wed Aug 23 16:15:13 2023]
Error in rule get_orfs_nt:
    jobid: 34
    input: COI/chimera.denoised.nonchimeras.taxon
    output: COI/orfs.fasta.nt
    shell:
        ORFfinder -in COI/chimera.denoised.nonchimeras.taxon -g 5 -s 2 -ml 30 -n true -strand plus -outfmt 1 > COI/orfs.fasta.nt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job get_orfs_nt since they might be corrupted:
COI/orfs.fasta.nt
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-08-23T161508.106824.snakemake.log

What's confusing to me is that despite adding ORFfinder to my path, it still wasn't found. $ ORFfinder

-bash: ORFfinder: command not found Despite: $ echo $PATH

/home1/09274/gjdury/bin/ORFfinder: (from which I've removed the other paths for clarity).