tanghaibao / jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics
BSD 2-Clause "Simplified" License
742 stars 187 forks source link

Genes plotted in the wrong position #615

Closed Tong-Chen closed 9 months ago

Tong-Chen commented 9 months ago

Here I generate a demo block file, a demo bed file, and a demo layout file. I could generate the microsynteny plot but with some errors. I checked the file and thought it should be right. Could you help to check if this is the problem with the file format or the program? Thanks!

a demo block file:

demo_block.txt

species1chr1gene1   .   species2chr1geneA   species2chr2geneB   species3scaffold1gene001
species1chr1gene1   .   species2chr1geneC   species2chr2geneB   species3scaffold1gene001
species1chr1gene2   .   .   species2chr2geneB   .
.   species1chr2gene3   species2chr1geneD   .   species3scaffold1gene002
.   .   .   species2chr2geneE   species3scaffold1gene003

a demo bed file

demo_bed.txt

chr1    1000    2000    species1chr1gene1   .   +
chr1    4000    5000    species1chr1gene2   .   -
chr2    4000    5000    species1chr2gene3   .   +
chr1    2000    3000    species2chr1geneA   .   -
chr1    4000    5000    species2chr1geneC   .   +
chr1    6000    7000    species2chr1geneD   .   +
chr2    2000    3500    species2chr2geneE   .   -
chr2    4000    5000    species2chr2geneB   .   -
scaffold1   2000    3500    species3scaffold1gene001    .   +
scaffold1   4000    4500    species3scaffold1gene002    .   -
scaffold1   5000    6500    species3scaffold1gene003    .   +

a demo layout file

demo_layout.txt

#x, y, rotation, ha, va, color, ratio, label, label_font_size (optional)
0.3, 0.5, 0, left, top, #000000, 0.8, species1 chr1, 10
0.8, 0.5, 0, center, top, #000000, 0.2, species1 chr2, 10
0.3, 0.8, 0, left, top, #000000, 0.6, species2 chr1, 10
0.7, 0.8, 0, center, top, #000000, 0.4, species2 chr2, 10
0.5, 0.2, 0, center, top, #000000, 1, species3 scaffold1, 10
#edges
e, 0, 2
e, 1, 2
e, 0, 3
e, 1, 3
e, 0, 4
e, 1, 4

Then I ran python -m jcvi.graphics.synteny demo_block.txt demo_bed.txt demo_layout.txt --format svg --genelabelsize 8, I could get the picture, and with some misplaced genes.

  1. species2chr1geneA is placed on species1 chr1
  2. species1chr1gene2 is placed on species2 chr2

From the log, jcvi identified all 5 scaffolds and all features (containing duplicates).

[12/14/23 16:00:51] INFO     `latex` not found. latex use is disabled.                                                                                                                                   base.py:613
                    INFO     Set text.usetex=False. Font styles may be inconsistent.                                                                                                                     base.py:435
[16:00:51] DEBUG    Load file `demo_bed.txt`                                                                                                                                                              base.py:34
           DEBUG    Load file `demo_block.txt`                                                                                                                                                            base.py:34
           DEBUG    Load file `demo_layout.txt`                                                                                                                                                           base.py:34
Column 0: species1chr1gene1 - species1chr1gene2 (chr1:1001-5000)
  chr1 .. 3 (3) features .. +
Column 1: species1chr2gene3 - species1chr2gene3 (chr2:4001-5000)
  chr2 .. 1 (1) features .. +
Column 2: species2chr1geneA - species2chr1geneD (chr1:2001-7000)
  chr1 .. 4 (3) features .. +
Column 3: species2chr2geneE - species2chr2geneB (chr2:2001-5000)
  chr2 .. 3 (4) features .. -
Column 4: species3scaffold1gene001 - species3scaffold1gene003 (scaffold1:2001-6500)
  scaffold1 .. 3 (4) features .. +
                    DEBUG    Matplotlib backend is: agg

Chen Tong

Tong-Chen commented 9 months ago

PS.

If changing the bed file to the below context making chromosomes have different names in all species, the generated plot is right.

chr1    1000    2000    species1chr1gene1   .   +
chr1    4000    5000    species1chr1gene2   .   -
chr2    4000    5000    species1chr2gene3   .   +
chr1a   2000    3000    species2chr1geneA   .   -
chr1a   4000    5000    species2chr1geneC   .   +
chr1a   6000    7000    species2chr1geneD   .   +
chr2a   2000    3500    species2chr2geneE   .   -
chr2a   4000    5000    species2chr2geneB   .   -
scaffold1   2000    3500    species3scaffold1gene001    .   +
scaffold1   4000    4500    species3scaffold1gene002    .   -
scaffold1   5000    6500    species3scaffold1gene003    .   +
tanghaibao commented 9 months ago

@Tong-Chen

Yes you found the solution to the issue yourself. The collision in chr/contig names can lead to issues in plotting