Closed stephenturner closed 8 years ago
Hey Stephen,
Thanks for trying it! This would suggest to me that there is no overlap at the ends (which really deserves a better error message!). Have you tried aligning your plasmid sequence to itself using Gepard http://www.helmholtz-muenchen.de/en/icb/software/gepard/index.html? It'll create a dotplot, which should pretty clearly show any self-overlap (diagonals in the upper right and lower left corners) ...
~Joe
On Wed, Jul 8, 2015 at 10:59 AM, Stephen Turner notifications@github.com wrote:
Hi Joseph. I ran across this tool and wanted to give it a spin. I have a plasmid assembly and the dot plot tells me it's circular, but the ends are messy.
I'm trying to run apc on a single 90kb contig and I get the note that I have irregular LAST results. I tried running the commands that appear to be in the perl script, and I look at my alignment file, and this is what I see:
$ lastdb tmp ../hyb-plas120-nanopgm.fasta
$ lastal -s1 -x300 -f0 -T1 tmp ../hyb-plas120-nanopgm.fasta
LAST version 548
#
a=7 b=1 A=7 B=1 c=100000 e=40 d=15 x=300 y=9 z=300
R=10 u=0 s=1 T=1 m=10 l=1 n=10 k=1 i=8192 w=1000 t=0.910239 g=1 j=3 Q=0
tmp
Reference sequences=1 normal letters=94654
#
A C G T
A 1 -1 -1 -1
C -1 1 -1 -1
G -1 -1 1 -1
T -1 -1 -1 1
#
Coordinates are 0-based. For - strand matches, coordinates
in the reverse complement of the 2nd sequence are used.
#
score name1 start1 alnSize1 strand1 seqSize1 name2 start2 alnSize2 strand2 seqSize2 blocks
#
batch 0
94654 plasmid_120_nanopore_pgm_corrected_pilon 0 94654 + 94654 plasmid_120_nanopore_pgm_corrected_pilon 0 94654 + 94654 94654
Query sequences=1
That is, a single line, not 3, which is what the script was expecting. Any idea what's going on here? Thanks.
— Reply to this email directly or view it on GitHub https://github.com/jfass/apc/issues/2.
Joseph Fass Lead Data Analyst UC Davis Genome Center - Bioinformatics Core http://bioinformatics.ucdavis.edu/ jnfass@ucdavis.edu phone ~ 530.752.2698
Joe,
I did actually. I'm still puzzling over how to interpret (and deal with) the very large overlap, and the excess diagonals I see toward the end of the sequence (repetitive sequence?). But if I'm interpreting this correctly, the top-right and bottom-left diagonals suggest circularity, albeit with messy ends.
Now that's bizarre. I'm having trouble wrapping my head around that graph. It seems like there may be a (long) tandem duplication that's been collapsed into one copy toward the end of the contig. Does that make sense? So the diagonal that starts next to the "021_dim" on the y-axis indicates that the penultimate sequence (let's say, "tuvw" in the alphabet that covers the whole contig) overlaps the first sequence in the contig ("abcd"), but the following sequence "xyz" doesn't match "efg" ... instead it matches "uvw" again. So either some sequence has been incorrectly duplicated at the end of the contig, or a tandem duplication (of 10-20kbp) has been collapsed at the start of the contig. If I'm seeing it correctly.
How'd your plasmid name get flipped on the y-axis, btw?
What do you think? If there's a way to give a more meaningful error message, to help figure out what's going on here, I'd be happy to add it in. Actually, Keith Bradnam rewrote my awful Perl, and I have yet to apply his suggestions. He also found another tool or two ... something from AMOS, I believe, and a more recent tool called "Circulator" (which I can't find at the moment. I just stumbled across this https://github.com/sheikki/identifyCircularContigs/blob/master/identifyCircularContigs. But I have no idea if any of them will be able to figure out your situation.
~Joe
On Thu, Jul 9, 2015 at 2:19 AM, Stephen Turner notifications@github.com wrote:
Joe,
I did actually. I'm still puzzling over how to interpret (and deal with) the very large overlap, and the excess diagonals I see toward the end of the sequence (repetitive sequence?). But if I'm interpreting this correctly, the top-right and bottom-left diagonals suggest circularity, albeit with messy ends.
[image: image] https://cloud.githubusercontent.com/assets/460076/8591991/fa56f698-25f9-11e5-84bb-631ca1531c98.png
— Reply to this email directly or view it on GitHub https://github.com/jfass/apc/issues/2#issuecomment-119884111.
Joseph Fass Lead Data Analyst UC Davis Genome Center - Bioinformatics Core http://bioinformatics.ucdavis.edu/ jnfass@ucdavis.edu phone ~ 530.752.2698
Thanks for those suggestions. I actually ran into circlator (no "u") prior to apc, but with outputs undocumented it's not immediately useful. I took identifyCircularContigs for a spin, but need to dust off my perl to get this working.
But you're right, I've got some odd things happening at the end of my assembly. This was a single-contig nanopore-only assembly from celera polished with ion torrent reads using pilon. May warrant another minION run with newer flow cells.
RE: axis, just used the most recent version of gepard and it was like this out of the box.
Closing this issue because I believe apc did just what it should have when it reported irregular last results. Only suggestion: echo to screen the command used to run lastdb/lastal. Or add a -verbose option to echo to screen all the commands issued.
Hi Joseph. I ran across this tool and wanted to give it a spin. I have a plasmid assembly and the dot plot tells me it's circular, but the ends are messy.
I'm trying to run apc on a single 90kb contig and I get the note that I have irregular LAST results. I tried running the commands that appear to be in the perl script, and I look at my alignment file, and this is what I see:
That is, a single line, not 3, which is what the script was expecting. Any idea what's going on here? Thanks.