Closed ghost closed 11 years ago
On 17/04/13 07:46 PM, lorendarith wrote:
After using it I found the script did not demultiplex some samples correctly, i.e. I can detect other indexes in one of the samples after demultiplexing.
This was the SampleSheet: FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator,SampleProject BC1D9GACXX,4,XY,k,AGTTCC-AGTTCC,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch1,a,CGATGT-CGATGT,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch2,b,TGACCA-TGACCA,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch3,c,ACAGTG-ACAGTG,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch4,d,CAGATC-CAGATC,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch5,e,ATGTCA-ATGTCA,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch6,f,CCGTCC-CCGTCC,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch7,g,GTCCGC-GTCCGC,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch8,h,GTGAAA-GTGAAA,hh,N,R1,Chr,BC1D9GACXX
This was the outcome: [Status] Project_BC1D9GACXX Sample_ch6 59899 0.0284597645191% Project_BC1D9GACXX Sample_ch7 10895196 5.1766258627% Project_BC1D9GACXX Sample_ch4 13525783 6.4264945845% Project_BC1D9GACXX Sample_ch5 8701182 4.13418572527% Project_BC1D9GACXX Sample_ch3 10998429 5.22567487638% Project_BC1D9GACXX Sample_ch1 12023228 5.71258681513% Project_BC1D9GACXX Sample_ch8 641774 0.30492557329% Project_BC1D9GACXX Sample_XY 148957721 70.7741642259% Project_BC1D9GACXX Sample_ch2 3695266 1.75572881343% Undetermined_indices Sample_lane4 970586 0.46115375892% All All 210469064 100.00%
But I after I pulled out the reads from the biggest Sample_XY and checked the index read (R3), apart from the correct index AGTTCCGT, I also found others like ATGTCAGA.
It was a dual index run, but not all lanes, like this one, had dual indexes, so the 2nd index (R3) is actually not usable. I tried running the script without specifying anything for the 2nd index and the walk-around with linking R3 back to R2, and the outcome is always the same.
Also: IndexSize= 5625 Index1Length= 6 AllowedMismatchesInIndex1= 3 Index2Length= 6 AllowedMismatchesInIndex2= 3
the indexes I entered, were 6 bases long and if 3 mismatches are allowed then AGTTCC can actually become ATGTCA.
There is a tool to check collisions:
https://github.com/sebhtml/FastDemultiplexer/blob/master/CheckBarcodeCollisions.py
Could this mismatch rate be the sole reason of the issue? Isn't 3 a bit too high for just 6 bases? Should longer adapter sequences be given, like 7? If so, won't you lose more reads with longer indexes?
There are 4096 sequences of length 6. Choosing AGTTCC and ATGTCA as adapters is probably not a good thing
AGTTCC | || ATGTCA
Obviously in that case, FastDemultiplexer.py should not be using 3 mismatches.
Related question:
Do you know if Illumina will eventually release a new version of CASAVA -- v1.8.2 is like 3 year old at least !!!
— Reply to this email directly or view it on GitHub https://github.com/sebhtml/FastDemultiplexer/issues/3.
test case /rap/nne-790-ab/Instruments/Illumina_HiSeq_1000_Hellbound/130131_SNL131_0070_AC1MMMACXX
Hi lorendarith !
The commit 71067623ee7cff88655c45382a53cd40510c92b7 should fix this.
Thanks.
Please close the issue it the problem was fixed by the commit.
After using it I found the script did not demultiplex some samples correctly, i.e. I can detect other indexes in one of the samples after demultiplexing.
This was the SampleSheet: FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator,SampleProject BC1D9GACXX,4,XY,k,AGTTCC-AGTTCC,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch1,a,CGATGT-CGATGT,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch2,b,TGACCA-TGACCA,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch3,c,ACAGTG-ACAGTG,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch4,d,CAGATC-CAGATC,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch5,e,ATGTCA-ATGTCA,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch6,f,CCGTCC-CCGTCC,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch7,g,GTCCGC-GTCCGC,hh,N,R1,Chr,BC1D9GACXX BC1D9GACXX,4,ch8,h,GTGAAA-GTGAAA,hh,N,R1,Chr,BC1D9GACXX
This was the outcome: [Status] Project_BC1D9GACXX Sample_ch6 59899 0.0284597645191% Project_BC1D9GACXX Sample_ch7 10895196 5.1766258627% Project_BC1D9GACXX Sample_ch4 13525783 6.4264945845% Project_BC1D9GACXX Sample_ch5 8701182 4.13418572527% Project_BC1D9GACXX Sample_ch3 10998429 5.22567487638% Project_BC1D9GACXX Sample_ch1 12023228 5.71258681513% Project_BC1D9GACXX Sample_ch8 641774 0.30492557329% Project_BC1D9GACXX Sample_XY 148957721 70.7741642259% Project_BC1D9GACXX Sample_ch2 3695266 1.75572881343% Undetermined_indices Sample_lane4 970586 0.46115375892% All All 210469064 100.00%
But I after I pulled out the reads from the biggest Sample_XY and checked the index read (R3), apart from the correct index AGTTCCGT, I also found others like ATGTCAGA.
It was a dual index run, but not all lanes, like this one, had dual indexes, so the 2nd index (R3) is actually not usable. I tried running the script without specifying anything for the 2nd index and the walk-around with linking R3 back to R2, and the outcome is always the same.
Also: IndexSize= 5625 Index1Length= 6 AllowedMismatchesInIndex1= 3 Index2Length= 6 AllowedMismatchesInIndex2= 3
the indexes I entered, were 6 bases long and if 3 mismatches are allowed then AGTTCC can actually become ATGTCA.
Could this mismatch rate be the sole reason of the issue? Isn't 3 a bit too high for just 6 bases? Should longer adapter sequences be given, like 7? If so, won't you lose more reads with longer indexes?