hoelzer-lab / rnaflow

A simple RNA-Seq differential gene expression pipeline using Nextflow
GNU General Public License v3.0
94 stars 20 forks source link

Piano and WebGestalt #229

Closed m-jahani closed 8 months ago

m-jahani commented 9 months ago

The cluster system on which I am running RNAflow only has internet access on the main node. Consequently, when I executed the job using Slurm, both the piano and WebGestalt steps were skipped. In an attempt to resume the process locally (as opposed to using Slurm, which runs on the head node with internet access) after the Slurm run was completed, I encountered an issue. The pipeline started reanalyzing all the steps that had already been completed, whereas I only wanted it to run the skipped steps (piano and WebGestalt). What is the best way to handle this situation and ensure that only the piano and WebGestalt steps are executed in the local run?

Thanks

MarieLataretu commented 9 months ago

Hi @m-jahani ,

This should be doable with an injected configuration. First, create a simple text file, let's say my.config. In there, we set the executor only for those two processes to local:

process {    
    withName: webgestalt {
        executor = "local"
    }
    withName: piano {
        executor = "local"
    }
}

This configuration can then be injected by adding it with -c (see docs):

nextflow run [...] -c my.config

Let me know if it works!

m-jahani commented 9 months ago

I tried it. The intention was to run everything on Slurm except for the piano and WebGestalt steps. However, it did not work as expected, and it failed at the two specified steps.

nextflow run hoelzer-lab/rnaflow -c /home/mjahani/scratch/NEW_TMP/clean_slurm_test3/my.config -profile slurm,conda,latency --skip_sortmerna \
  --reads /home/mjahani/scratch/files/test_tmp_clean.csv \
  --genome /home/mjahani/scratch/files/fastas.csv \
  --annotation /home/mjahani/scratch/files/gtfs.csv \
  --permanentCacheDir /home/mjahani/scratch/conda_dataset/nextflow-autodownload-databases \
  --condaCacheDir /home/mjahani/scratch/conda_dataset/conda \
  --pathway hsa
= = = = = = =   = = = = = = =  = = = = = = = =  =  = = = = = = = = = = = =  = = = =  = = = = = = = = = =  = = = = = = = =
Output path:                    results
Strandedness                    unstranded
Read mode:                      paired-end
TPM threshold:                  1
Comparisons:                    all
Nanopore mode:                  false

executor >  slurm (166)
[e4/d10372] process > concat_genome                                                              [100%] 1 of 1 ✔
[ca/8f7b4a] process > concat_annotation                                                          [100%] 1 of 1 ✔
[60/541159] process > preprocess_illumina:fastqcPre (43896327)          [100%] 26 of 26 ✔
[d3/766a42] process > preprocess_illumina:fastp (44476765)                [100%] 26 of 26 ✔
[f6/8fd925] process > preprocess_illumina:fastqcPost (44172717)           [100%] 26 of 26 ✔
[33/16a324] process > preprocess_illumina:hisat2index                                            [100%] 1 of 1 ✔
[e7/0fc4c8] process > preprocess_illumina:hisat2 (44009528)               [100%] 26 of 26 ✔
[97/7c4ddb] process > preprocess_illumina:index_bam (44009528)            [100%] 26 of 26 ✔
[df/b69117] process > expression_reference_based:featurecounts (44009528) [100%] 26 of 26 ✔
[33/f26765] process > expression_reference_based:format_annotation_gene_rows                     [100%] 1 of 1 ✔
[26/ef2bb2] process > expression_reference_based:format_annotation                               [100%] 1 of 1 ✔
[9b/36f412] process > expression_reference_based:tpm_filter                                      [100%] 1 of 1 ✔
[4d/450fc1] process > expression_reference_based:deseq2 (1)                                      [100%] 2 of 2, failed: 1, retries: 1 ✔
[7d/6c39b9] process > expression_reference_based:piano (0_vs_1)                                  [100%] 1 of 1, failed: 1
[-        ] process > expression_reference_based:webgestalt                                      [  0%] 0 of 1
[d0/c9ccdf] process > expression_reference_based:multiqc_sample_names (1)                        [100%] 1 of 1 ✔
[56/f9bb66] process > expression_reference_based:multiqc (1)                                     [100%] 1 of 1 ✔
ERROR ~ Error executing process > 'expression_reference_based:piano (0_vs_1)'

Caused by:
  Process requirement exceeds available CPUs -- req: 24; avail: 6

Command executed:

  R CMD BATCH --no-save --no-restore '--args c(".") c("deseq2_0_vs_1_filtered_padj_0.05.csv") c("hsa") c("ensembl_gene_id") c("24")' piano.R

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /lustre07/scratch/mjahani/NEW_TMP/clean_slurm_test3/work/7d/6c39b921ba00746d013403294f4069

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
MarieLataretu commented 9 months ago

Okay, Piano seems to request too many CPUs

Caused by: Process requirement exceeds available CPUs -- req: 24; avail: 6

You can change that in the config snippet accordingly:

process {    
    withName: webgestalt {
        executor = "local"
    }
    withName: piano {
        executor = "local"
        cpus = 4
        memory = { 4.GB * task.attempt }
    }
}
m-jahani commented 8 months ago

Thanks a bunch, that was super helpful!