Open ewels opened 5 years ago
Hi @kokyriakidis,
TrimGalore! already trims low-quality bases I think. Have a look at the FastQC reports from after the trimming to check the read qualities (these FastQC results are not shown in the MultiQC report, you need to look in results/trim_galore/fastqc
.
Trim Galore removes poor qualities from the 3' end of sequence, but does not do this from the 5' end (as it is never really needed). If you have some special kind of BS-seq data I'd be happy to look at your FastQC report (the HTML file), to see if I have any recommendations.
On a related note, a close colleague of mine has recently written a tool to analyse raw sequencing files and try to figure out which kind of bisulfite sequencing experiment had been performed. The tool (Charades) is still kind of in alpha mode, but you are very welcome to give it a try! https://github.com/ChristelKrueger/Charades
Regarding your second question: I haven't looked at the ENCODE-DCC pipeline myself, but you could either
The latter approach will be a a lot quicker, but it might not be as accurate for repetitive regions within your genes of interest, and you will have to adjust the positions if you want to bring it back in line with other genome coordinates.
@FelixKrueger Thank you very much! You can check the fastq in this link:
These are the initial files:
https://wetransfer.com/downloads/1b780edcfca5363f09e74202cca1ebd020190325114013/08a8a71624bbad7ecf6db365789c066520190325114013/affc1f
These are after TrimGalore with default settings:
https://wetransfer.com/downloads/a88486a3a7c15ac09042a291f60673b220190325114819/e96bc14c81defea887f22aa21369c0e820190325114819/c7af3f
Whoah, that looks quite wild indeed. Judging by the sequence composition, it doesn't look like a standard sequencing protocol, it rather looks like amplicon sequencing to me. Any chance this is targetting the PTPN22 gene, and nothing else? I would probably proceed with a standard trim_galore --paired file1 file2
command, followed by a default directional alignment, and see what you get. Cheers, Felix
It is targeting PTPN22
but also the lns-1
gene
Hi all, @FelixKrueger @ewels
I tried to run the pipeline with some data, but I had no idea about how they have been produced. I know just that they are methylation data. SO when fastqc run, I saw that they have bad quality at the end. I wonder if trim galore should have an option in this pipeline to cut bases by quality (5' and 3' Trimming) instead of just raw bases.
Also, I will have some methylation data for 3 specific genes in the next days. I looked into Bismark but I didn't see any option to narrow down the search windows for methylation given eg a bed file. Searching the internet I found this approach:
Any thoughts?
Originally posted by @kokyriakidis in https://github.com/nf-core/methylseq/issues/85#issuecomment-476149257