mjsull / chromatiblock

Colinear block visualisation tool
GNU General Public License v3.0
30 stars 4 forks source link

Graphing error around single core block #13

Open weizhenxu opened 3 years ago

weizhenxu commented 3 years ago

Hi:

I'm looking at gene rearrangement in the neighbourhood flanking a single core block in related plasmid sequences - when the core block is oriented in the strand=+ direction, the sequence is correctly graphed in left flank -> core -> right flank order.

However, when the orientation of the core block is reversed (strand=-), the sequence is graphed as core -> left+right flank, with the non-core blocks from both flanks overlaid over each other on the right. Nothing is plotted to the left of the core block (refer to the last 2nd-4th row of the attached screenshot)

Chromatiblock was installed and executed on a fresh conda env (Python 3.8.6) as recommended by the Github.

Thanks! Weizhen Capture

mjsull commented 3 years ago

Hi Weizen,

Do you mind sending me the fastas you used to generate this image?

Best,

Mitch

weizhenxu commented 3 years ago

Hi:

I've attached a set of similar files and the html outputs from running it - note that when the core region is in the reverse orientation the left flank is graphed to the right of the core.

In the meanwhile, I've managed to work around the issue by orienting all the fastas so that each core region is facing in the same direction. There are still some issues defining the core region, I'll document those in a later post 12rep_10k.zip

weizhenxu commented 3 years ago

A second problem that arises is that I'm getting different core block sizes when trying to align sequences based on a common core block (these sequences all share at least a 813bp gene region (possibly with SNP variations), as verified by BLAST). (see attached screenshot and input/output files)

By definition, the core region should be conserved across all sequences and thus be the same length; but instead I'm getting truncated variants of what the program defines to be the core on certain sequences. This seems to be arise from sequence insertions near the core gene which disrupt the structure of the core block. The truncations are also misaligned relative to the standard core block - all the core blocks are aligned starting from the left (i.e. like the left-justify on a word processor), even if they should be aligned from the right.

I'm thinking that this could be fixed by allowing us to set the minimum core block size to less than 1000 - I'm already running chromatiblock at -m 100, but this doesn't seem to affect the minimum core block size. Another consequence of this is that some sequences with insertions next to the 813bp core region break the alignment with the "No core blocks found. No regions >1000bp were found once in all genomes. Please use more closely related genomes" error message.

image

Thanks for looking into this; sorry for the trouble with the non-standard usage but it would be really useful if this could work well on single core alignments!

Cheers, Weizhen TCall.zip