Closed iqbal-lab closed 7 years ago
These are the alleles at site 79 - you will need to scroll right to see the end, and to notice I have marked alleles 24 and 34
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCAT**TATGAAG**
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCCAAAAAGGATGAATGTAAAGGTGCATTAAAGCAT**TATGAAG**
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGAATGATTCACAACTATTAGAAATATCCAAAAAGGATGAATGTAAAGGTGCATTAAAGCAT**TATGAAG**
AGGATGTAATAAAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAACAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAGAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATGAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAA
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGGTGCATTAAAGAATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAAATTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGAATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAACAAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATGTCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAAGACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAATAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAA
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AAAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG 24
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCCAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATCAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGGTTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGATGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAATTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTGAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTTACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AAGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG 34
AGGATGTAATAAAACTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGGAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGGTTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
The next two bases after this site are AA
So our kmer TATGAAGAA overlaps many (in fact all except two) of these alleles, running past the right hand 79 marker and then finishing on the AA adjacent. I've highlighted the first few.
I think this issue is v old andmaybe dead @sm0179 ?
yes
Sorina spotted some samples were hitting/failing an assertion. Basically, if a read was mapping horizontally uniquely but not vertically uniquely, this assert was checking that at that time, nothing had been pushed onto the sites list, in the knowledge that it was going to be pushed after exiting the current function. Actually the problem can be reproduced using just the last kmer of the read, and putting it in the kmers input file.
So, what is the command?
where
OK, so what happens. First of all, if you look in the PRG file, you can see that the kmer overlaps site 79, and overlaps the end of many alleles, with AA poking out of the right hand end of the site. So, when we precalculate the sa_interval and sites for this kmer, you should see all these alleles in the sites object. But in fact somehow it reduces it all down to just alleles 24 and 34.
I guess the thing to do is run it in gdb and look at where sa_intervals and sites are erased.