richardmleggett / nextclip

Nextera Long Mate Pair analysis and processing tool
GNU General Public License v3.0
19 stars 7 forks source link

Unexpected result #21

Open ptranvan opened 8 years ago

ptranvan commented 8 years ago

Hello,

I've used MaSuRCA for an assembly composed of paired end and "real mate pair" (after nextclip process, cat A,B and C). Then I've estimated my insert size mate pairs with Picard.

I got strange result:

MEDIAN_INSERT_SIZE READ_PAIRS PAIR_ORIENTATION 181 1712772 FR 2809 489932 RF 656 724122 TANDEM

That's very strange because the majority of my reads are in fact PE with insert size of 181 bp. The real mate pairs look like very minor: 489932 mate with an insert size of 2809.

How can we explain that ?? because theoretically, it should be 100% of RF reads

richardmleggett commented 8 years ago

Hi Patrick,

I'm on holiday, so won't get a chance to look at this in any detail for a week or so, I'm afraid.

However, if it's still a problem, can you give me more information on what you ran - e.g. command lines for MaSuRCA and Picard?

Thanks, Richard

On 22 Aug 2016, at 12:57, Patrick Tran Van notifications@github.com<mailto:notifications@github.com> wrote:

Hello,

I've used MaSuRCA for an assembly composed of paired end and "real mate pair" (after nextclip process, cat A,B and C). Then I've estimated my insert size mate pairs with Picard.

I got strange result:

MEDIAN_INSERT_SIZE READ_PAIRS PAIR_ORIENTATION 181 1712772 FR 2809 489932 RF 656 724122 TANDEM

That's very strange because the majority of my reads are in fact PE with insert size of 181 bp. The real mate pairs look like very minor: 489932 mate with an insert size of 2809.

How can we explain that ?? because theroetically, it should be 100% of RF reads

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/richardmleggett/nextclip/issues/21, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAWKvgi2oOgsswQvSbOtpRUvAafAPzbNks5qiY6PgaJpZM4Jpyj1.

ptranvan commented 8 years ago

Hi Richard,

0) Nextclip:

nextclip -i R1.fastq -j R2.fastq -o name -e

1) MaSuRCA: Default command, the config file looks like this for the mate pair:

JUMP= m1 3000 450 MP.pair1.fastq MP.pair2.fastq

The MP have been treated by nextclip and I've concatenated categories A,B,C.

2) Then I've mapped these MP reads on the assembly:

bowtie2 -x assembly.fasta -1 MP.pair1.fastq -2 MP.pair2.fastq | samtools view -bS - > bowtie2/MP.pair12.bam

3) And insert estimation:

samtools sort MP.pair12.bam -o MP.sorted.pair12.bam

picard-tools CollectInsertSizeMetrics I=MP.sorted.pair12.bam O=insert_size_3000.txt H=insert_size_3000.pdf

richardmleggett commented 8 years ago

Hmmm.... not sure! So, to summarise: you have about 3m mate pairs in total and you're finding only 0.5 million of them are aligning as you would expect?

richardmleggett commented 8 years ago

I don't know MaSuRCA, but it's not that you need to specify the mate pairs as innies, with a negative std dev, is it?

ptranvan commented 7 years ago

Maybe it's the nextclip command. Do you have any recommendation for standard command ?

I mean, this one work well ?

nextclip -i R1.fastq -j R2.fastq -o name -e

richardmleggett commented 7 years ago

Hi Patrick, the command looks ok. What did the NextClip screen output look like? Do you still have it? Were there reasonable numbers in each of the categories?

ptranvan commented 7 years ago

Hi Richard,

Maybe it's because of MaSuRCA... I've tried SOAPdenovo and I got more MP well oriented (RF).

I'm trying now ALLPATHS-LG. Did you have any experience with Nextclip and ALLPATHS-LG ?

richardmleggett commented 7 years ago

No, sorry, no personal experience. Would be interested to hear how you get on with it.