meglab-metagenomics / amrplusplus_v2

MEGARes and AmrPlusPlus - A comprehensive database of antimicrobial resistance genes and user-friendly pipeline for analysis of high-throughput sequencing data
http://megares.meglab.org/
MIT License
25 stars 15 forks source link

Skip processes in nextflow #1

Closed AroArz closed 4 years ago

AroArz commented 4 years ago

Hey, I'm wondering if there's a clean workaround to skipping processes in the main_AmrPlusPlusv2.nf.

I'm only interested in running rady samples with alignment to MEGAres as such skipping all the other processes such as trimmomatic, HostRemovalStats, RemoveHostDNA, etc. The workaround I could come up with was changing the channel Reads directly to Non_host_fastq_Megares and making a new channel

Reads = Channel.empty().

Following this i added to to all processes I wanted to skip errorStrategy 'ignore'. Using this i got

(AMRPlusPlus2) [aroarz@rackham1 amrplusplus_v2]$ nextflow run main_AmrPlusPlus_v2.nf -profile singularity patterns/ignore-failing-process.nf
N E X T F L O W  ~  version 20.01.0
Launching `main_AmrPlusPlus_v2.nf` [grave_ampere] - revision: 418454ec59
executor >  local (18)
[-        ] process > RunQC                    -
[08/d5e32d] process > QCStats                  [100%] 1 of 1, failed: 1 ✔
[b3/a68fc0] process > BuildHostIndex           [100%] 1 of 1 ✔
[-        ] process > AlignReadsToHost         -
[-        ] process > RemoveHostDNA            -
[4b/96f3b6] process > HostRemovalStats         [100%] 1 of 1, failed: 1 ✔
[-        ] process > NonHostReads             -
[27/5f99a3] process > BuildAMRIndex            [100%] 1 of 1 ✔
[17/e5a035] process > AlignToAMR               [100%] 3 of 3 ✔
[23/fbd5ea] process > RunResistome             [100%] 3 of 3 ✔
[fa/61759b] process > ResistomeResults         [100%] 1 of 1 ✔
[0a/f1843d] process > SamDedupRunResistome     [100%] 3 of 3 ✔
[b2/ce972f] process > SamDedupResistomeResults [100%] 1 of 1 ✔
[a1/648797] process > RunRarefaction           [100%] 3 of 3 ✔
Completed at: 25-Feb-2020 14:20:17
Duration    : 4m 54s
CPU hours   : 0.3 (3.2% failed)
Succeeded   : 16
Ignored     : 2
Failed      : 2

This worked fine one your test raw reads which were included when using -profile singularity. However as I'm running on a slurm scheduler I used the -profile singularity_slurm and it gives me the following error.

(AMRPlusPlus2) [aroarz@rackham1 amrplusplus_v2]$ nextflow run main_AmrPlusPlus_v2.nf -profile singularity_slurm patterns/ignore-failing-process.nf                                                                                                                             N E X T F L O W  ~  version 20.01.0
Launching `main_AmrPlusPlus_v2.nf` [hungry_euler] - revision: 418454ec59
WARN: There's no process matching config selector: DedupReads
WARN: There's no process matching config selector: AssembleReads
WARN: There's no process matching config selector: HMM_amr
WARN: There's no process matching config selector: AlignDedupedToContigs
WARN: There's no process matching config selector: AlignToContigs
WARN: There's no process matching config selector: HMMcontig_count
WARN: There's no process matching config selector: AlignDedupSNPToAMR
WARN: There's no process matching config selector: DedupRunResistome -- Did you mean: SamDedupRunResistome?
WARN: There's no process matching config selector: RunFreebayes
WARN: There's no process matching config selector: RunSNPFinder
WARN: There's no process matching config selector: SNPAlignToAMR -- Did you mean: AlignToAMR?
WARN: There's no process matching config selector: SNPRunResistome -- Did you mean: RunResistome?
WARN: There's no process matching config selector: SNPRunRarefaction -- Did you mean: RunRarefaction?
WARN: There's no process matching config selector: SNPconfirmation
WARN: There's no process matching config selector: SNPgene_alignment
WARN: There's no process matching config selector: SNPRunFreebayes
WARN: There's no process matching config selector: SNPRunSNPFinder
WARN: There's no process matching config selector: SNPResistomeResults -- Did you mean: ResistomeResults?
WARN: There's no process matching config selector: DedupNonSNPResistomeResults
WARN: There's no process matching config selector: HMMResistomeResults -- Did you mean: ResistomeResults?
WARN: There's no process matching config selector: Samtools_dedup_HMMcontig_count
WARN: There's no process matching config selector: Samtools_dedup_HMMResistomeResults
WARN: There's no process matching config selector: ExtractSNP
WARN: There's no process matching config selector: RunRGI -- Did you mean: RunQC?
WARN: There's no process matching config selector: Confirmed_AMR_hits
WARN: There's no process matching config selector: Confirmed_ResistomeResults
WARN: There's no process matching config selector: ExtractDedupSNP
WARN: There's no process matching config selector: RunDedupRGI
WARN: There's no process matching config selector: DedupSNPconfirmation
WARN: There's no process matching config selector: ConfirmDedupAMRHits
WARN: There's no process matching config selector: DedupSNPConfirmed_ResistomeResults
[-        ] process > RunQC                    -
[af/7765c3] process > QCStats                  [100%] 1 of 1, failed: 1
[78/dfc0d8] process > BuildHostIndex           [100%] 1 of 1, failed: 1
[-        ] process > AlignReadsToHost         -
[-        ] process > RemoveHostDNA            -
[-        ] process > HostRemovalStats         [  0%] 0 of 1
[-        ] process > NonHostReads             -
[d8/4d7b6a] process > BuildAMRIndex            [100%] 1 of 1, failed: 1
[-        ] process > AlignToAMR               -
[-        ] process > RunResistome             -
[-        ] process > ResistomeResults         -
[-        ] process > SamDedupRunResistome     -
[-        ] process > SamDedupResistomeResults -
[-        ] process > RunRarefaction           -
[af/7765c3] NOTE: Error submitting process 'QCStats (null)' for execution -- Error is ignored
[78/dfc0d8] NOTE: Error submitting process 'BuildHostIndex (chr21.fasta)' for execution -- Error is ignored
Error executing process > 'BuildAMRIndex (megares_modified_database_v2.00)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  sbatch .command.run

Command exit status:
  1

Command output:
  sbatch: error: Errors in job submission:
  sbatch: error: ERROR 1: Invalid project.
  sbatch: error: Use the flag -A to specify an active project with allocation on this cluster.
  sbatch: error: Batch job submission failed: Unspecified error

Work dir:
  /crex/path/to/amrplusplus_v2/work/d8/4d7b6afc462bf833eb11aebd761344

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Im new both to slurm and nextflow so I'm thankful for any input, really.

Cheers

Aron

AroArz commented 4 years ago

Hello, worked with it abit and missed the obvious that I had to edit singularity_slurm.config with account info and correct partitions so that is all sorted.

Another problem I'm having though is that when running main_AMRPlusPlus_v2_withRGI.nf on -profile singularity I get the following error message

N E X T F L O W  ~  version 20.01.0
Launching `main_AmrPlusPlus_v2_withRGI.nf` [condescending_bassi] - revision: ffbdfda211
executor >  local (8)                                                                                                                                                                                                                                                          [aa/242799] process > RunQC                              [100%] 3 of 3, cached: 3 ✔
[ca/1213ac] process > QCStats                            [100%] 1 of 1, cached: 1 ✔                                                                                                                                                                                            [4c/a5225d] process > BuildHostIndex                     [100%] 1 of 1, cached: 1 ✔
[59/76392e] process > AlignReadsToHost                   [100%] 3 of 3, cached: 3 ✔
[de/a5fbaa] process > RemoveHostDNA                      [100%] 3 of 3, cached: 3 ✔
[00/6bbff0] process > HostRemovalStats                   [100%] 1 of 1, cached: 1 ✔
[42/2a1733] process > NonHostReads                       [100%] 3 of 3, cached: 3 ✔
[f9/a01a07] process > BuildAMRIndex                      [100%] 1 of 1, cached: 1 ✔
[12/229e64] process > AlignToAMR                         [100%] 3 of 3, cached: 3 ✔                                                                                                                                                                                            [5c/9f499f] process > RunResistome                       [100%] 3 of 3, cached: 3 ✔
[26/24c4b1] process > ResistomeResults                   [100%] 1 of 1, cached: 1 ✔
[81/062cf9] process > SamDedupRunResistome               [100%] 3 of 3, cached: 3 ✔
[4c/882b52] process > SamDedupResistomeResults           [100%] 1 of 1, cached: 1 ✔
[f5/276a4e] process > RunRarefaction                     [100%] 3 of 3, cached: 3 ✔
[66/30687b] process > ExtractSNP                         [100%] 3 of 3, cached: 3 ✔
[36/78a5bb] process > RunRGI                             [100%] 3 of 3, failed: 3 ✔
[-        ] process > SNPconfirmation                    -
[-        ] process > Confirmed_AMR_hits                 -
[8b/af36dd] process > Confirmed_ResistomeResults         [100%] 1 of 1, failed: 1 ✘
[08/99695c] process > ExtractDedupSNP                    [100%] 3 of 3, cached: 3 ✔
[cc/6d7f5b] process > RunDedupRGI                        [100%] 3 of 3, failed: 3 ✔
[-        ] process > DedupSNPconfirmation               -
[-        ] process > ConfirmDedupAMRHits                -
[-        ] process > DedupSNPConfirmed_ResistomeResults -
[50/3ef6de] NOTE: Process `RunRGI (S1_test)` terminated with an error exit status (1) -- Error is ignored
[ad/c16117] NOTE: Process `RunDedupRGI (S2_test)` terminated with an error exit status (1) -- Error is ignored                                                                                                                                                                 [58/61ff20] NOTE: Process `RunDedupRGI (S1_test)` terminated with an error exit status (1) -- Error is ignored
[5a/45a051] NOTE: Process `RunRGI (S2_test)` terminated with an error exit status (1) -- Error is ignored
[36/78a5bb] NOTE: Process `RunRGI (S3_test)` terminated with an error exit status (1) -- Error is ignored
[cc/6d7f5b] NOTE: Process `RunDedupRGI (S3_test)` terminated with an error exit status (1) -- Error is ignored
WARN: Killing pending tasks (1)
Error executing process > 'Confirmed_ResistomeResults (null)'

Caused by:
  Process `Confirmed_ResistomeResults (null)` terminated with an error exit status (2)

Command executed:

  python3 /crex/proj/uppstore2017086/projects/017_Arg/data/benchmarking/amrplusplus_v2/bin/amr_long_to_wide.py -i  -o perfect_SNP_confirmed_AMR_analytic_matrix.csv

Command exit status:
  2

Command output:
  (empty)

Command error:
  usage: amr_long_to_wide.py [-h] -i INPUT_FILES [INPUT_FILES ...] -o
                             OUTPUT_FILE
  amr_long_to_wide.py: error: argument -i/--input_files: expected at least one argument

Work dir:
  /crex/proj/uppstore2017086/projects/017_Arg/data/benchmarking/amrplusplus_v2/work/8b/af36ddc6d5ea7f622bd067654bffc3

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`                                                                                                                                                              

It looks like RunRGI and RunDedupRGI is failing causing everything else downstream to halt. I can't really figure out why, i checked /test_results/ExtractMegaresSNPs/SNP_fasta and there seems to be snps from each of the test files.

If you've got any ideas I'd be grateful.

EnriqueDoster commented 4 years ago

Hello Aron,

Thanks for trying AMR++ and submitting your question. I'm glad you were able to update the configuration file to get it working with SLURM.

We are looking into the issues you are reporting, but it seems like something changed with RGI because we are no longer getting results on the same samples we used to test AMR++. We're reaching out to the RGI team and will put out another release of AMR++ that fixes this issue.

In the meantime, you can still explore your resistome results by running one of the scripts that doesn't include RGI. You can then edit the "AMR_analytic_matrix.csv" file to remove counts to those gene accessions that needed additional screening for specific residues with RGI. These genes are all labeled with "RequiresSNPConfirmation" in their headers.

I'll be sure to respond here when we are ready to push the next updates. Please let us know if you have any other questions.

Thanks! Enrique

AroArz commented 4 years ago

Hey!

Thanks for the reply, is it possible to use the extracted SNPs in ExtractMegaresSNPs/SNP_fasta and run those with RGI manually?

Kind regards Aron

From: EnriqueDostermailto:notifications@github.com Sent: 19 March 2020 18:05 To: meglab-metagenomics/amrplusplus_v2mailto:amrplusplus_v2@noreply.github.com Cc: Aron Arzoomandmailto:Aron_arzoomand@live.se; Authormailto:author@noreply.github.com Subject: Re: [meglab-metagenomics/amrplusplus_v2] Skip processes in nextflow (#1)

Hello Aron,

Thanks for trying AMR++ and submitting your question. I'm glad you were able to update the configuration file to get it working with SLURM.

We are looking into the issues you are reporting, but it seems like something changed with RGI because we are no longer getting results on the same samples we used to test AMR++. We're reaching out to the RGI team and will put out another release of AMR++ that fixes this issue.

In the meantime, you can still explore your resistome results by running one of the scripts that doesn't include RGI. You can then edit the "AMR_analytic_matrix.csv" file to remove counts to those gene accessions that needed additional screening for specific residues with RGI. These genes are all labeled with "RequiresSNPConfirmation" in their headers.

I'll be sure to respond here when we are ready to push the next updates. Please let us know if you have any other questions.

Thanks! Enrique

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/meglab-metagenomics/amrplusplus_v2/issues/1#issuecomment-601301471, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMKSZQACJBY2RUVXWBSNHNDRIJGELANCNFSM4K3NC4FQ.

EnriqueDoster commented 4 years ago

Hey Aron,

Try this script that I attached. main_AmrPlusPlus_v2_ignoreRGI.zip

It's the "main_AmrPlusPlus_v2_withRGI.nf" script, but I added the " errorStrategy 'ignore' " line to the rest of the processes that use RGI. This way, AMR++ should run without stopping at the RGI steps and you still get access to the ExtractMegaresSNPs/SNP_fasta.

Let me know if that doesn't work for you and if you are able to get results from RGI with your samples.

Best, Enrique

AroArz commented 4 years ago

Hey, just letting you know that the script worked and that I've gotted the extracted SNPs. I'm however having trouble with RGI as well now, it seems they've got a lot of dependency issues within their own conda-installable package. I'll get back to you once I've managed to get it running.

Kind regards Aron

AroArz commented 4 years ago

Hello Enrique, I managed to run the extracted SNPs on RGI, I'll attach the output for one sample if thats of any interest to you. It seems that the problem comes from creating the virtual environment with RGI. I tried changing your containers/Singularity from

conda create -n AmrPlusPlus_env Python=3.6 biopython rgi trimmomatic bwa samtools bedtools vcftools htslib ncurses kraken2 blast

to

conda create -n AmrPlusPlus_env rgi=5.1.0 biopython trimmomatic bwa samtools bedtools vcftools htslib ncurses kraken2 blast

in order to see if that would fix the issue with your pipeline. Unfortunately my test run has been stuck in queue for ~1 day now so I decided to write this anyway in the event you wanted to test for yourself.

Kind regards Aron

strict.txt

AroArz commented 4 years ago

Hi again! Some other changes I previously made but which I failed to mention.

Please change the following lines in config/singularity_slurm.config

    JAVA = '/usr/local/envs/AmrPlusPlus/bin//java'
    TRIMMOMATIC = '/usr/local/envs/AmrPlusPlus/share/trimmomatic/trimmomatic.jar'

to

    JAVA = '/usr/local/envs/AmrPlusPlus_env/bin/java'
    TRIMMOMATIC = '/usr/local/envs/AmrPlusPlus_env/share/trimmomatic/trimmomatic.jar'

So it reflects the changes you made ~ January in config/singularity.config. It will not run on Slurm otherwise.

nailabc commented 4 years ago

Hi, Aron. I would like to do the same as you (to skip trimmomatic and host DNA removal processes), but besides adding the lines for errorStrategy 'ignore' I couldn't reproduce the modifications you mentioned you did: changing the channel reads directly to non_host_fastq_megares and making a new channel "Reads = Channel.empty()" I'm not really familiar with NF files and I'm far from being an expert in programming, so I didn't figure out where I should make these changes. Would you mind sharing your modified file or to be more specific about which lines you modified, please? Thanks a lot

AroArz commented 4 years ago

main_AmrPlusPlus_v2_withRGI.nf.gz

Hi!

Make the following changes

Channel
    .fromFilePairs( params.reads, flat: true )
    .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" }
    .set { reads }

To

Channel
    .fromFilePairs( params.reads, flat: true )
    .ifEmpty { exit 1, "Read pair files could not be found: ${params.reads}" }
    .set { non_host_fastq_megares }

reads = Channel.empty()

This way reads gets interpreted as an empty list. Furthermore you need to change the name of non_host_fastq_megares in one of the processes downstream to avoid confusion for nextflow. I've attached a file with the respective changes.

nailabc commented 4 years ago

Thanks, Aron! It worked now (at least for the test!)

EnriqueDoster commented 4 years ago

Aron, thanks for helping out! Naila, please let us know if you have any other questions. Thanks!

EnriqueDoster commented 4 years ago

Hi again! Some other changes I previously made but which I failed to mention.

Please change the following lines in config/singularity_slurm.config

    JAVA = '/usr/local/envs/AmrPlusPlus/bin//java'
    TRIMMOMATIC = '/usr/local/envs/AmrPlusPlus/share/trimmomatic/trimmomatic.jar'

to

    JAVA = '/usr/local/envs/AmrPlusPlus_env/bin/java'
    TRIMMOMATIC = '/usr/local/envs/AmrPlusPlus_env/share/trimmomatic/trimmomatic.jar'

So it reflects the changes you made ~ January in config/singularity.config. It will not run on Slurm otherwise.

I appreciate you calling this out. I'll be sure to fix it in an update soon. Thanks!