nf-core / methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
https://nf-co.re/methylseq
MIT License
137 stars 136 forks source link

Add CX report support from bismark_methylation_extractor #372

Open Tintest opened 8 months ago

Tintest commented 8 months ago

Description of feature

Hello,

Would it be possible for you to add the --CX option from the bismark_methylation_extractor script as you were able to add the --comprehensive option ?

Regards.

ewels commented 8 months ago

We can certainly look into it 👍🏻

Note that it is possible to tweak the arguments supplied to tools yourself, without modifying pipeline code - through using Nextflow configs. This can let you get moving right away, without waiting for a new parameter to be added to the pipeline (which is better, but takes a while). See the docs on the topic here. Feel free to jump into the #methylseq channel on the nf-core Slack if you need any help.

Tintest commented 8 months ago

Hello,

Thank you for your quick reply. It's indeed a good way of doing things, although having the option directly in the pipeline would be a plus :)

But it'll do just fine in the meantime.

Thank you very much.

cjfields commented 6 months ago

I was a little surprised by this result at first, primarily since we have used the older version of methylseq (likely from the DSL1 line) and I believe had reports for all contexts when we requested --comprehensive. So a definite +1 if this isn't implemented, we work with a lot of non-model systems or check methylation in developmental contexts where some researchers suggest non-CpG may play a more significant role.

ewels commented 6 months ago

Should be an easy modification. The arguments for this step are set in the pipeline here:

https://github.com/nf-core/methylseq/blob/54f823e102ef3d04077cc091a5ae435519f9923a/conf/modules.config#L207-L217

In fact, --CX is already being set when --nomeseq is specified. So it's just a case of tweaking the params logic in this block in whatever way makes sense.

ewels commented 6 months ago

eg. if we always want this flag when running the pipeline with --comprehensive, then it's as simple as changing line 209:

- params.comprehensive   ? ' --comprehensive --merge_non_CpG' : '', 
+ params.comprehensive   ? ' --comprehensive --merge_non_CpG --CX' : '', 
cjfields commented 5 months ago

Thanks @ewels I think doing this within a custom config that overrides the above logic to do what I need should work. I'm wondering whether this is something that could be documented as an example in the nf-core methylseq docs, modifying or tweaking this specific step seems to come up regularly enough.

cjfields commented 4 months ago

A small update on this. I did manage to get this running by simply using a custom conf with the following in the process block:

     withName: BISMARK_METHYLATIONEXTRACTOR {
     ext.args = { [
         params.comprehensive   ? ' --comprehensive --merge_non_CpG  --CX' : '',
         params.meth_cutoff     ? " --cutoff ${params.meth_cutoff}" : '',
         params.nomeseq         ? '--CX' : '',
         params.ignore_r1 > 0   ? "--ignore ${params.ignore_r1}" : '',
         params.ignore_3prime_r1 > 0   ? "--ignore_3prime ${params.ignore_3prime_r1}" : '',
         meta.single_end ? '' : (params.no_overlap           ? ' --no_overlap'                         : '--include_overlap'),
         meta.single_end ? '' : (params.ignore_r2        > 0 ? "--ignore_r2 ${params.ignore_r2}"       : ""),
         meta.single_end ? '' : (params.ignore_3prime_r2 > 0 ? "--ignore_3prime_r2 ${params.ignore_3prime_r2}": "")
     ].join(' ').trim() }
     }

However, I'm leaving this here in case others have the same problem I had. The --CX flag as used above does seem to throw an error on some of our recent samples:

...
chr NC_012920.1 (16569 bp)
chr chrEBV (171823 bp)
chr phage_lambda (48502 bp)
chr plasmid_puc19c (2686 bp)
chr phage_T4 (168903 bp)
chr phage_Xp12 (64272 bp)

Stored sequence information of 196 chromosomes/scaffolds in total

==============================================================================
Methylation information will now be written into a genome-wide cytosine report
==============================================================================

Adding context-specific methylation summaries

>>> Writing genome-wide cytosine report to: 208_Plasma.CpG_report.txt.gz <<<

>>> Writing all cytosine context summary file to: 208_Plasma.cytosine_context_summary.txt <<<

No last chromosome was defined, something must have gone wrong while reading the data in (e.g. specified wrong file path for a gzipped coverage file?). Please check your command!

It's not consistent, but removing that extra parameter fixed the issue. It's not clear why this occurred yet so will need some debugging. Note this had several EM-Seq controls included, so maybe it had an issue with those.