nf-core / methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
https://nf-co.re/methylseq
MIT License
137 stars 137 forks source link

bwameth index creation fails #352

Open FutureFellwalker opened 10 months ago

FutureFellwalker commented 10 months ago

Description of the bug

Hi.

I'm encountering an error with the methylsig pipeline when using bwameth aligner. The pipeline runs fine with Bismark, but my mapping is low (~40%) and I want to compare against bwameth.

The error seems to be in preparing the reference genome index. I tried saving the reference genome locally and running offline, but encounter the same issue. Tried with both methylsig 2.4.0 and 2.5.0.

Command used and terminal output

Command input:

~/nextflow run /path/to/nf-core-methylseq_2.5.0/2_5_0/ \
--input samplesheet.csv \
--outdir nfcore-methylseq-2.5.0-bwameth \
--genome hg38 \
--save_reference \
--rrbs \
--pbat \
--aligner bwameth \
--methyl_kit \
-profile singularity \
--email user@email.com

Error:
ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX'

Caused by:
  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/ngi-igenomes/igenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa))

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

nextflow.log

System information

Nextflow version 23.04.3 HPC, LSF executor nf-core/methylseq 2.4.0 and 2.5.0 Singularity container

mz448 commented 10 months ago

Hey, I have successfully run the pipeline using --aligner bismark

Issue description:

I am having the same issue reported here when I am trying to repeat the analysis using bwameth instead of bismark:

Test

The following test works fine

nextflow run nf-core/methylseq --aligner bwameth --outdir $RefDir -profile test,conda

Summary of my experimental conditions:

Command used and error message:

Pipeline Info

N E X T F L O W ~ version 23.10.0 nf-core/methylseq v2.5.0-g66c6138 Conda

nextflow run nf-core/methylseq \
--input $SampleSheet.csv \
--fasta $Genome.fasta \
--outdir $RefDir \
--bwa_meth_index $DirContainingRefFastaGenome \
--email user@email.com \
--aligner bwameth \
--comprehensive \
--max_cpus 40 \
--max_memory 400.GB \
-profile conda

Error: 
-[nf-core/methylseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX'

Caused by:
  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/local/storage/Projects/ppar_emseq/data/005_ppar_emseq_muscle/sub/P_parae_wMito+lambda+puc19.fasta))

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

I have tried including and excluding the following flags, but no version has worked:

--save_align_intermeds \
--ignore_flags \
--save_reference \ 

I appreciate any help! :)

Maximiliano

mz448 commented 10 months ago

Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

You need to provide the 2 indexed versions of the genome

  1. --bwa_meth_index $DirContainingRefGenome versions \

Install bwameth and use bwameth.py --reference $RefGenome.fasta This will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the --bwa_meth_index flag)

  1. --fasta_index $RefGenome.fai \

    Use SamTools samtools faidx $RefGenome.fasta It will produce a $RefGenome.fai This is an index of the non-converted genome See a good explanation here

ewels commented 7 months ago

This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

I disagree. You've found your way around the bug by providing more inputs, meaning that the pipeline did not need to create them. However, the pipeline should automatically generate the index files itself if only given a Fasta file.

This is what we need to fix:

  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/local/storage/Projects/ppar_emseq/data/005_ppar_emseq_muscle/sub/P_parae_wMito+lambda+puc19.fasta))

Looks like there's something wrong with how the channels are being built / supplied.

jkh00 commented 6 months ago

Hi,

I also encountered the same issue when trying to run bwameth without supplying the indexes. But in my case, supplying --fasta_index was sufficient to work around the bug (my run succeeded), which narrowed down and suggested that the issue arises somewhere during the process of building index using samtools faidx.

drothen15 commented 5 months ago

Has this been fixed? I'm still having the same error when trying to use bwameth as the aligner

oliviapetrillo commented 4 months ago

Has this been fixed? I'm still having the same error when trying to use bwameth as the aligner

@drothen15 Not sure if a fix has been included in a release yet, but I was able to fix this locally by modifying the SAMTOOLS_FAIDX process inputs.

In the prepare_genome.nf file, when calling the SAMTOOLS_FAIDX process if the params.fasta_index is not specified, there is an issue when trying to pass a value channel in as part of the tuple. I think it would work if you either 1) removed the tuple, or 2) switched the ch_fasta input to be just the file(params.fasta) instead of Channel.value(file(params.fasta)).

I resolved it by removing the tuple altogether, and then updating the main.nf containing the SAMTOOLS_FAIDX process to expect a val and path input, instead of a tuple with a val and path. If you just change the original process input as specified in prepare_genome.nf to be a file instead of the value channel, there should be no need to update the SAMTOOLS_FAIDX:main.nf.

One solution (that I used):

// inside prepare_genome.nf
SAMTOOLS_FAIDX([:], ch_fasta)

instead of

SAMTOOLS_FAIDX([[:], ch_fasta])

And then modifying SAMTOOLS_FAIDX:main.nf:

input: 
val(meta)
path(fasta)

instead of

input: 
tuple val(meta), path(fasta)
LucaZanella15 commented 3 months ago

Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

You need to provide the 2 indexed versions of the genome

  1. --bwa_meth_index $DirContainingRefGenome versions \

Install bwameth and use bwameth.py --reference $RefGenome.fasta This will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the --bwa_meth_index flag)

  1. --fasta_index $RefGenome.fai \

Use SamTools samtools faidx $RefGenome.fasta It will produce a $RefGenome.fai This is an index of the non-converted genome See a good explanation here

Sorry, I have a doubt: is the input to --bwa_meth_index, i.e. $DirContainingRefGenome, the output of the bwameth.py --reference $RefGenome.fasta as mentioned in this reply or the output of bwameth.py index $RefGenome.fasta?

Thank you!

JihedC commented 3 weeks ago

Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth You need to provide the 2 indexed versions of the genome

  1. --bwa_meth_index $DirContainingRefGenome versions \

Install bwameth and use bwameth.py --reference $RefGenome.fasta This will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the --bwa_meth_index flag)

  1. --fasta_index $RefGenome.fai \

Use SamTools samtools faidx $RefGenome.fasta It will produce a $RefGenome.fai This is an index of the non-converted genome See a good explanation here

Sorry, I have a doubt: is the input to --bwa_meth_index, i.e. $DirContainingRefGenome, the output of the bwameth.py --reference $RefGenome.fasta as mentioned in this reply or the output of bwameth.py index $RefGenome.fasta?

Thank you!

I think the later, for me that's how I got it to work