nf-core / methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
https://nf-co.re/methylseq
MIT License
137 stars 142 forks source link

Add SNP calling #65

Open ewels opened 5 years ago

ewels commented 5 years ago

It would be nice to be able to have the option of calling variants from bisulfite data.

It shouldn't be too tricky to add Bis-SNP or something similar as a new opt-in process. There may be other / better tools also?

bazyliszek commented 5 years ago

Felix Krueger mentioned four different packages for that purpose. Bis-SNP, MethylExtract, BS-SNPer and CGmapTools. Also, BScall can do.

Also bit different stuff, from Wreczycka et all paper 2017:

"the majority of CpGs with high inter-population differences contain common genomic SNPs (minor allele frequency > 0.01) (Daca-Roszaket al., 2015). To ensure more reliable interpretation of the data we advise removing known C/T SNPs which can interfere with methylation calls."

It would be also nice to have a dictionary with these sites for human and possibility of removing it, if desired (--remove.common_snps).

Variant calls could be also derived from matched genome sequencing data or public databases such as dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/dbSNP.cgi?list=sslist)

ewels commented 5 years ago

Ooh, @FelixKrueger? I wouldn't trust that guy.. 😆 Yes all sounds good - does anyone have a favourite tool?

The common SNPs feature would be nice, but I guess that's a separate issue as it doesn't require SNP calling, it's just a filtering step right? Do such lists already exist somewhere? Perhaps we can generate such a list from a VCF file in the pipeline. Then we could use the files available for multiple species already in iGenomes.

I think that matching to WGS and external databases is perhaps beyond the scope of this pipeline for now. If the pipeline produces a VCF it shouldn't be too difficult for people to play with this anyway. We could perhaps even make a separate nf-core pipeline for doing pairwise comparison / QC of VCF files...

FelixKrueger commented 5 years ago

I agree, it might be a nice pipeline to have. The tools mentioned above were - of course (in good old bioinformatics manner) - shown to be much superior to previously published tools. We don't personally use SNP exclusion on a regular basis, so I am not sure which one is best/easiest to implement.

On a slightly different note, would anyone object if we dropped Bowtie (1) from Bismark, and added HISAT2 instead?

ewels commented 5 years ago

Sure - go for it! Alignment speed can be one of the main annoyances with Bismark so a faster tool with comparable output would be great 👍 (though does this mean that I have to update the --relaxMismatches code? 😱 )

brucemoran commented 2 years ago

Hi, was this ever implemented or is there a fork that some work was done on?