Closed LeilyR closed 7 years ago
Hi Leily, I had a look at the graph you sent, but to me it looks ok. What I did was the following:
reveal extract reveal.gfa nc000913.3.fasta --width70 > tmp.fasta
So I extract the sequence through the graph that corresponds to the path of nc000913.3.fasta
Now tmp.fasta seems to be exactly the same as the nc000913.3.fasta file. Also when i align tmp.fasta with nc000913.3.fasta using:
reveal align nc000913.3.fasta tmp.fasta
It finds that they are exactly the same.
About "S 4", this refers to segment (or node) 4 in the graph.
S 4 C * ORI:Z:0;2 OFFSETS:Z:1806;1806 RC:i:2
From this line you can derive that both nc000913.3.fasta and nc010473.fasta have a C at position 1806 (zero-based).
What I generally do is that I use Bandage (https://github.com/rrwick/Bandage) to visualise the alignment graph. I hope this helps. Let me know if I misunderstood something...
Cheers, Jasper
Hi! Thanks a lot for the reply. I will check it again. I might have made a mistake in cutting the sequence. Also thanks a lot for the link! Best, Leily
Hi! I used the tool to align couple of E. coli genomes against each other. The result didn't match the original sequences i used as input. I am wondering if you know how it happened. I send you both the gfa file created by Reveal and the fasta file that I used. It seems S 4 contains C while the base at the same position from fasta file is something different. Is it how it is supposed to be? Thanks a lot! Leily
nc000913.3.fasta.gz
reveal.gfa.gz