Open plger opened 6 years ago
Trimming is possible using QuasR package. preprocessReads
The collapsing function should give a count matrix such as this one: seqs.counts.tar.gz ( This one comes from the fastq in /IM/data/SSC/shortRNA/raw )
To reduce a bit the number of useless sequences, before merging I normally exclude sequences that appear only once.
I also include here the alignment file for the same data, as well as the annotation. With these three files you have what's needed to create a shortRNAexp object (using the new version of the function). alignment_and_annotation.tar.gz
Hi @plger
fastq
files:fastq
files and generating a reportI have checked several tools for checking and plotting the quality of fastq
files. These tools include:
fastq
file and plots the results.FastQC
tool to make plots.In my opinion, Rqc
package is the one that we can use for reporting quality of the fastq
files. Further, some of the plotting might be adapted from the fastqcr
package.
fastq
files (trimming N's, adapters and removing short reads)I have checked several tools for the quality of fastq
files. These tools include:
R
wrapper for adapterremoval. It provides adapterremoval
binaries for Mac, Linux and Windows. I am not sure if we should use it (because it is using a tool not written in R
).ShortRead
. So, also looks good to me.But, I think, we have 3 problems here:
Detection of adapter sequences.
A solution to this problem could be to use
plgINS::tryAdapters
. But, it is not implemented for paired-end data (I didn't find it).
Trimming based on quality scores.
A solution to this is using code from page 5 of the ShortRead vignette. I think, it would be easy to adapt.
Trimming Trailing and Leading N's. But, I think, that should be taken care when we define the quality scores for trimming. I am not sure though!
Now, here is what I think needs to be done:
plgINS::tryAdapters
for PE data.preprocessReads
function from QuasR
and/ or adapt adapter_filter
function from FastqCleaner
Quality check (with a HTML report) --> Check adapters (plot with plgINS::plotAdapterResults
) --> Quality control --> Quality re-check (with a HTML report)
We can discuss it during our meeting!
Ok, this one I'm not sure how cross-platform we can manage to make this, but ideally what we'd like is to enable the user to go all the way from raw fastq files to the shortRNAexp object all from R. This means: