Open MichaelFokinNZ opened 10 years ago
Hi,
Thanks for this.
Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…
For de novo assembly, we would use categories A, B and C, but leave out D.
Thanks, Richard
On 12 Jun 2014, at 10:47, MikhailFokinNZ notifications@github.com<mailto:notifications@github.com> wrote:
Richard hi! I've decided to start new issue, just to share more info about undetected adaptors. I am working with MiSeq reads 300bp, my pipeline is (raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious. "A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details. "B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more. I've analysed in details some of these files and found that:
I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.
— Reply to this email directly or view it on GitHubhttps://github.com/richardmleggett/nextclip/issues/11.
Hi all,
I have been using NextClip for a few different species' genomes and for several different mate interval ranges. But, I have never found cases like this - undetected/remaining internal junction adaptors.
For your info, the latest version of FASTQC (released in early this month; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) can detect Nextera junction adaptors in reads and report them to you, in the panel 'Adaptor Content'. This new function is very useful in evaluating mate-pair read properties and confirming removal of adaptors after running NextClip. I hope this helps you,, too.
Best regards,
Shigehiro
(2014/06/12 22:53), Richard Leggett wrote:
Hi,
Thanks for this.
Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…
For de novo assembly, we would use categories A, B and C, but leave out D.
Thanks, Richard
On 12 Jun 2014, at 10:47, MikhailFokinNZ notifications@github.com<mailto:notifications@github.com> wrote:
Richard hi! I've decided to start new issue, just to share more info about undetected adaptors. I am working with MiSeq reads 300bp, my pipeline is (raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious. "A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details. "B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more. I've analysed in details some of these files and found that:
- Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
- There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
- I have not analysed read pairs yet.
I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.
— Reply to this email directly or view it on GitHubhttps://github.com/richardmleggett/nextclip/issues/11.
— Reply to this email directly or view it on GitHub https://github.com/richardmleggett/nextclip/issues/11#issuecomment-45893742.
Thanks.
I meant to say previously that don’t forget you can adjust the match parameters with --strict_match
Thanks, Richard
On 12 Jun 2014, at 17:34, MesutOezil notifications@github.com<mailto:notifications@github.com> wrote:
Hi all,
I have been using NextClip for a few different species' genomes and for several different mate interval ranges. But, I have never found cases like this - undetected/remaining internal junction adaptors.
For your info, the latest version of FASTQC (released in early this month; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) can detect Nextera junction adaptors in reads and report them to you, in the panel 'Adaptor Content'. This new function is very useful in evaluating mate-pair read properties and confirming removal of adaptors after running NextClip. I hope this helps you,, too.
Best regards,
Shigehiro
(2014/06/12 22:53), Richard Leggett wrote:
Hi,
Thanks for this.
Are you saying that there are category B & C cases where, after processing with NextClip, there are all 19 bases of the junction adaptor present? If so, if you could send me a file of example reads (e.g a few hundred reads) before processing, then I will try and work out what is going on…
For de novo assembly, we would use categories A, B and C, but leave out D.
Thanks, Richard
On 12 Jun 2014, at 10:47, MikhailFokinNZ notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Richard hi! I've decided to start new issue, just to share more info about undetected adaptors. I am working with MiSeq reads 300bp, my pipeline is (raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious. "A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details. "B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more. I've analysed in details some of these files and found that:
- Only few (dozens) duplicated adaptors left - all cases have 1 nucleotide indel in the junction site
- There are plenty of single adaptors with 100% hit to the read, both is terminal and inside positions.... :( I would say few thousands per 1M reads and more partial adaptors less than 18 nucleotides.
- I have not analysed read pairs yet.
I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.
— Reply to this email directly or view it on GitHubhttps://github.com/richardmleggett/nextclip/issues/11.
— Reply to this email directly or view it on GitHub https://github.com/richardmleggett/nextclip/issues/11#issuecomment-45893742.
— Reply to this email directly or view it on GitHubhttps://github.com/richardmleggett/nextclip/issues/11#issuecomment-45915665.
Thanks guys! I will analyse my data more precisely (new FastQC is awesome!) and provide you some files.
Richard hi! I've decided to start new issue, just to share more info about undetected adaptors. I am working with MiSeq reads 300bp, my pipeline is (raw data -> nextclip -> fastq-mcf -> blastn) last two steps are to check if any adaptors still present and finally I'am checking these cases manually in Geneious. "A" files almost doesn't suffer from junction adaptors - there are left <30/1M reads (fastq-mcf), and I haven't inspected this in details. "B" and "C" files look worse :( there are from hundreds to 23k adaptors per 1M reads detected by fastq-mcf, and mention that this software is able to detect end/start adaptors only, not from inside the sequence, so really there are more. I've analysed in details some of these files and found that:
I have no experience/ideas could it affect de-novo assembly, but will try not to avoid using B,C categories.