nf-core / methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
https://nf-co.re/methylseq
MIT License
137 stars 136 forks source link

Correct documentation to explain minimal inputs for bwa-meth subworkflow #363

Closed scoughlan2 closed 7 months ago

scoughlan2 commented 8 months ago

Description of the bug

The documentation here and here implies that bwa-meth can be run just by adding --aligner bwameth to a command line the one specified at here .

However the iGenomes file doesn't have bwa-meth indices so that genome cant be used without running bwa-meth index on the genome. The documentation for the genome flag doesn't explain that it won't work with bwameth. Supplying just a local reference genome looks like it should work from this code but according to this bug , that doesn't work.

I had a successful run using the following parameters:

--input <samplesheet.csv>
--outdir ./results
--aligner bwameth 
--bwa_meth_index </path/to/bwa_meth_index_dir>
--fasta <genome.fa>
--fasta_index <genome.fa.fai>
--profile docker 

I would request that the documentation is updated to reflect that you can't use an iGenomes reference for bwameth (currently at least) and to explain what is needed to run the bwameth subworkflow.

Command used and terminal output

Implication is that something like the code below can be used

nextflow run nf-core/methylseq --input ./samplesheet.csv --outdir ./results --genome GRCh38 --aligner bwameth -profile docker

However this results in the following error:

ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX (null)'

Caused by:
  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa))

System information

nfcore methylseq 2.5.0

cjfields commented 7 months ago

I'm seeing this as well. I should also mention there appear to be a related issue with the samtools faidx reference indexing as well, but the workflow using bismark works fine.

ewels commented 7 months ago

I think that the docs are fine, doing just --aligner bwameth did use to work. The pipeline just built the reference index for you if not supplied. --genome was still used to supply the Fasta files to build it.

I think what this really boils down to is that there's a bug with reference index building with bwameth.

ewels commented 7 months ago

Closing as a duplicate of https://github.com/nf-core/methylseq/issues/352

Feel free to disagree with me and we can reopen :)

LucaZanella15 commented 3 months ago

I have tried to run with the parameters listed above, where my inputs were generated as follows:

  1. --bwa_meth_index </path/to/bwa_meth_index_dir>: this is the ouput of the command bwameth.py index hg19.p13.plusMT.no_alt_analysis_set.fa

  2. --fasta <genome.fa>: is hg19.p13.plusMT.no_alt_analysis_set.fa

  3. --fasta_index <genome.fa.fai>: is the output of samtools faidx hg19.p13.plusMT.no_alt_analysis_set.fa

The pipeline dumps all the steps:

executor >  local (3)
[-        ] PIP…E_INITIALISATION:CAT_FASTQ -
[c9/c5a902] NFC…_BJ23005328-PT-A1-1_KC352) | 1 of 1 ✔
[42/471c05] NFC…_BJ23005328-PT-A1-1_KC352) | 1 of 1 ✔
[-        ] NFC…LSEQ:BWAMETH:BWAMETH_ALIGN -
[-        ] NFC…LSEQ:BWAMETH:SAMTOOLS_SORT -
[-        ] NFC…:SAMTOOLS_INDEX_ALIGNMENTS -
[-        ] NFC…:BWAMETH:SAMTOOLS_FLAGSTAT -
[-        ] NFC…SEQ:BWAMETH:SAMTOOLS_STATS -
[-        ] NFC…METH:PICARD_MARKDUPLICATES -
[-        ] NFC…AMTOOLS_INDEX_DEDUPLICATED -
[-        ] NFC…AMETH:METHYLDACKEL_EXTRACT -
[-        ] NFC…BWAMETH:METHYLDACKEL_MBIAS -
[-        ] NFC…Q:METHYLSEQ:QUALIMAP_BAMQC -
[-        ] NFC…:METHYLSEQ:PRESEQ_LCEXTRAP -
[5f/7026ae] NFC…ETHYLSEQ:METHYLSEQ:MULTIQC | 1 of 1 ✔
Pulling Singularity image https://depot.galaxyproject.org/singularity/multiqc:1.21--pyhdfd78af_0 [cache /ifs/scratch/c2b2/ac_lab/shares/DNA-methylation-dataset-2/work/singularity/depot.galaxyproject.org-singularity-multiqc-1.21--pyhdfd78af_0.img]
-[nf-core/methylseq] Pipeline completed successfully-
WARN: Singularity cache directory has not been defined -- Remote image will be stored in the path: /ifs/scratch/c2b2/ac_lab/shares/DNA-methylation-dataset-2/work/singularity -- Use the environment variable NXF_SINGULARITY_CACHEDIR to specify a different location
Completed at: 24-May-2024 13:39:42
Duration    : 24m 45s
CPU hours   : 0.4
Succeeded   : 3