tanghaibao / jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics
BSD 2-Clause "Simplified" License
760 stars 186 forks source link

Script for drawing gene model synteny? #50

Closed nextgenusfs closed 7 years ago

nextgenusfs commented 7 years ago

Lots of very useful tools in this repository, thanks for sharing! I'm wondering if buried in here somewhere is a graphics script that would draw a synteny map for gene models instead of entire scaffolds/chromosomes as done in allmaps? I'm looking for something on more of the 20-100Kb scale to show some local rearrangements/indels between two isolates of the same species. Perhaps something like you pass two bed files of gene models as well as an anchors file from jcvi.compara.catalog ortholog - and the result looks similar to the karyotype result except with individual gene models and orthologs identified between them. I'm happy to try to help come up with a solution, but my matplotlib skills are not good....

tanghaibao commented 7 years ago

@nextgenusfs yes it is possible and I have been using jcvi library to do this. It would be good to follow the instructions here. Scroll down to the end of the page for the microsynteny plots, but it would be nice to follow from the beginning so that you generate all the required files for plotting.

nextgenusfs commented 7 years ago

great @tanghaibao, that seems exactly what I need. One question, in the example and the script calls for all.bed and you seem to have done grape_peach.bed in this example:

python -m jcvi.graphics.synteny blocks grape_peach.bed blocks.layout

Is that just a simple concatenation/merge of both the bed files?

nextgenusfs commented 7 years ago

Just figured it out, I did the following and seemed to work, thanks:

python -m jcvi.formats.bed merge grape.bed peach.bed > grape_peach.bed
tanghaibao commented 7 years ago

@nextgenusfs yes - simple concatenation. In the small example we have:

$ cat grape.bed peach.bed > grape_peach.bed

When one wants to plot many genomes all at once you'll need:

$ cat grape.bed peach.bed apple.bed > all.bed

And then use the concatenated bed file for plotting. The bed file is used to retrieve the location for the genes so make sure all the gene names in the blocks file is represented in the bed FILE.

nextgenusfs commented 7 years ago

Thanks again, this will save me a lot of time.

nextgenusfs commented 7 years ago

In the scenario that you wanted to do multiple plots as you have in the macrosynteny, lets say I wanted to compare grape, peach, and orange microsynteny, where each comparison is made to peach (i.e. peach in the middle). Is this correct, or how would you combine the blocks?

#first mcscan
python -m jcvi.compara.synteny mcscan grape.bed grape.peach.lifted.anchors --iter=1 -o grape.peach.i1.blocks
#second mcsan
python -m jcvi.compara.synteny mcscan peach.bed peach.orange.lifted.anchors --iter=1 -o peach.orange.i1.blocks

Then filter for a region based on some gene names in peach for example

#pull out region between gene 10 and gene 20
awk '/gene10/ {p=1}; p; /gene20/ {p=0}' grape.peach.i1.blocks > gp.blocks
awk '/gene10/ {p=1}; p; /gene20/ {p=0}' peach.orange.i1.blocks > po.blocks

Then how would you combine the blocks?, via cat?

#combine the blocks????
cat gp.blocks po.blocks > all.blocks
#combine the bed files
cat grape.bed peach.bed orange.bed > all.bed

Layout might look like

# x,   y, rotation,   ha,     va,   color, ratio,            label
0.5, 0.7,        0, left, center,       m,     1,   grape Chr3
0.5, 0.5,        0, left, center,       r,     1,   peach chr3
0.5, 0.3,        0, left, center,       b,     1,   orange contig1
# edges
e, 0, 1
e, 1, 2

And then draw:

python -m jcvi.graphics.synteny all.blocks all.bed layout.blocks
tanghaibao commented 7 years ago

@nextgenusfs wow. You are ahead of me. I've now included a small example with multiple genomes on the same wiki page. Please let me know if you run into any issues.

zhaotao1987 commented 7 years ago

Thanks Haibao for such great tool! Just a follow-up to this post, I came across some problems when trying to plot multiple panels (>2) of synteny. Here's the error message:

python -m jcvi.graphics.synteny try-multi-blocks all-bed try-multi-blocks.layout
23:44:13 [driver] Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
23:44:13 [driver] Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
23:44:13 [base] Load file `all-bed`
23:44:31 [base] Load file `try-multi-blocks`
23:44:31 [base] Load file `try-multi-blocks.layout`
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/lib/python2.7/dist-packages/jcvi/graphics/synteny.py", line 364, in <module>
    main()
  File "/usr/local/lib/python2.7/dist-packages/jcvi/graphics/synteny.py", line 353, in main
    switch=switch, tree=tree, extra_features=opts.extra)
  File "/usr/local/lib/python2.7/dist-packages/jcvi/graphics/synteny.py", line 249, in __init__
    self.layout = lo = Layout(layoutfile)
  File "/usr/local/lib/python2.7/dist-packages/jcvi/graphics/synteny.py", line 76, in __init__
    self.append(LayoutLine(row, delimiter=delimiter))
  File "/usr/local/lib/python2.7/dist-packages/jcvi/graphics/synteny.py", line 44, in __init__
    self.x = float(args[0])
ValueError: could not convert string to float:

Block file:

ath_AT3G14450   aly_478869      Clevi.0014s0326
ath_AT3G14490   aly_341583      Clevi.0014s0330
ath_AT3G14510   aly_478873      Clevi.0014s0332
ath_AT3G14440   aly_478868      Clevi.0014s0323
ath_AT3G14415   aly_478862      Clevi.0014s0320
ath_AT3G14395   aly_478858      Clevi.0014s0317
ath_AT3G14400   aly_478859      Clevi.0014s0318
ath_AT3G14430   .       Clevi.0014s0321
ath_AT3G14460   aly_478870      Clevi.0014s0327
ath_AT3G14410   .       Clevi.0014s0319
ath_AT3G14380   aly_478855      .
ath_AT3G14560   .       .
ath_AT3G14580   .       Clevi.0014s0337
ath_AT3G14570   .       .
ath_AT3G14480   .       .
ath_AT3G14390   aly_478856      .

.layout file

# x,   y, rotation,   ha,     va,   color, ratio,            label
0.5, 0.6,        0, left, center,       m,     1,       A.thaliana Chr3
0.5, 0.4,        0, left, center, #fc8d62,     1, A.lyrata scaffold_3
0.5, 0.2,        0, left, center, #fc8d62,     1, Cleome scaffold_3
# edges
e, 0, 1
e, 1, 2

bed file should be fine..

Thanks a lot for your help.

tanghaibao commented 7 years ago

@zhaotao1987 the error message suggests try-multi-blocks.layout has some issues - first column should always be a number between 0 and 1. Would you please verify that the two lines - #x, y, ... and #edges starts with a hash #, i.e. these are treated as comments?

zhaotao1987 commented 7 years ago

@tanghaibao I just adjusted the layout of my post. The positions I wrote in .layout seems ok to me.

zhaotao1987 commented 7 years ago

@tanghaibao I changed the layout file to the following format.

# x,   y, rotation,     ha,     va, color, ratio,            label
0.5, 0.6,        0, center,    top,      ,     1,       Lyrata scaffold 3
0.5, 0.4,        0, center, top,      ,    1, A.thaliana Chr3
0.5, 0.2,        0, center, bottom,      ,    1, Cleome
# edges
e, 0, 1
e, 1, 2

Then it worked. Glad!

Yes, I realized that I made it wrong to va values in the previous setting, should be one of the top/bottom/middle..

Boer223 commented 2 years ago

I'm also come across some problems when trying to plot 2 panels microsynteny. The error message:

Traceback (most recent call last):
  File "/home/cuixb/tools/biosoft/conda3/envs/jcvi/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/cuixb/tools/biosoft/conda3/envs/jcvi/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/cuixb/tools/biosoft/conda3/envs/jcvi/lib/python3.8/site-packages/jcvi/graphics/synteny.py", line 599, in <module>
    main()
  File "/home/cuixb/tools/biosoft/conda3/envs/jcvi/lib/python3.8/site-packages/jcvi/graphics/synteny.py", line 575, in main
    Synteny(
  File "/home/cuixb/tools/biosoft/conda3/envs/jcvi/lib/python3.8/site-packages/jcvi/graphics/synteny.py", line 362, in __init__
    self.layout = lo = Layout(layoutfile)
  File "/home/cuixb/tools/biosoft/conda3/envs/jcvi/lib/python3.8/site-packages/jcvi/graphics/synteny.py", line 90, in __init__
    self.append(LayoutLine(row, delimiter=delimiter))
  File "/home/cuixb/tools/biosoft/conda3/envs/jcvi/lib/python3.8/site-packages/jcvi/graphics/synteny.py", line 51, in __init__
    self.x = float(args[0])
ValueError: could not convert string to float: ''

.blocks file:

BnaA02g13830D   .
BnaA02g13840D   BnaC02g18180D
BnaA02g13850D   .
BnaA02g13860D   BnaC02g18220D
BnaA02g13870D   BnaC02g18270D
BnaA02g13880D   .
BnaA02g13890D   .
BnaA02g13900D   .
BnaA02g13910D   BnaC02g18280D
BnaA02g13920D   BnaC02g18310D
BnaA02g13930D   .
BnaA02g13940D   .
BnaA02g13950D   .
BnaA02g13960D   .
BnaA02g13970D   .
BnaA02g13980D   BnaC02g18450D
BnaA02g13990D   .
BnaA02g14000D   BnaC02g18460D
BnaA02g14010D   BnaC02g18540D
BnaA02g14020D   BnaC02g18630D
BnaA02g14030D   BnaC02g18640D
BnaA02g14040D   BnaC02g18650D
BnaA02g14050D   .
BnaA02g14060D   .
BnaA02g14070D   BnaC02g18670D
BnaA02g14080D   .
BnaA02g14090D   BnaC02g18690D

.layout file:

# x,   y, rotation,     ha,     va, color, ratio,            label
0.5, 0.6, 0, left, center, ,     1, darmor_v4 chrA02
0.5, 0.4, 0, left, center, ,     1, darmor_v4 chrC02
# edges
e, 0, 1

.bed file:

chrA01  830 1437    BnaA01g00010D   34.3135 +
chrA01  1486    2436    BnaA01g00020D   42.145  -
chrA01  2664    5455    BnaA01g00030D   346.9611    -
chrA01  8420    9623    BnaA01g00040D   51.6032 +
chrA01  14269   15906   BnaA01g00050D   169.8236    -
chrA01  18235   19430   BnaA01g00060D   73.8188 +
chrA01  20031   20463   BnaA01g00070D   30.5444 -
chrA01  24908   25673   BnaA01g00080D   70.0142 +
chrA01  26914   29919   BnaA01g00090D   329.0838    +
chrA01  30065   31758   BnaA01g00100D   196.999 -

Thanks a lot for your help!

tanghaibao commented 2 years ago

@Chipcui

The error says that in your .layout file, there is one line with the first column missing, where we are trying to extract the x-position. Instead of a number, that line starts with an empty string "". Make sure that the layout file does not have empty lines. Maybe the empty line is at the end of file.

Additionally it looks like you are using an old version of jcvi. You can upgrade it to the latest by something like pip install -U jcvi.

Boer223 commented 2 years ago

After I remove the last empty line of layout file, it worked. Thank you!

Jiangjiangzhang6 commented 4 months ago

I met the same error, but remove the last emple, but now its show the another error (jcvi) $ head aae1.blocks hemp_hap_CM010794.2_001845.1 Cannabis1G15276.t1 Cannabis2G04722.t1 . jl_CM022967.1_001364.1 fujian_Chromosome_3_001080.1 hemp_hap_CM010794.2_001842.1 Cannabis1G15276.t1 Cannabis2G04722.t1 . jl_CM022967.1_001364.1 fujian_Chromosome_3_001075.1 hemp_hap_CM010794.2_000604.1 Cannabis1G15277.t1 Cannabis2G04721.t1 cs_NC_044372.1_000681.1 jl_CM022967.1_001365.1 fujian_Chromosome_3_001078.1 hemp_hap_CM010794.2_001843.1 Cannabis1G15277.t1 Cannabis2G04721.t1 . jl_CM022967.1_001365.1 fujian_Chromosome_3_001078.1 hemp_hap_CM010794.2_000604.1 Cannabis1G15277.t1 Cannabis2G04721.t1 cs_NC_044372.1_000681.1 jl_CM022967.1_001365.1 fujian_Chromosome_3_001078.1 hemp_hap_CM010794.2_001843.1 Cannabis1G15277.t1 Cannabis2G04721.t1 . jl_CM022967.1_001365.1 fujian_Chromosome_3_001078.1 hemp_hap_CM010794.2_001845.1 Cannabis1G15276.t1 Cannabis2G04722.t1 . jl_CM022967.1_001364.1 fujian_Chromosome_3_001080.1 hemp_hap_CM010794.2_001842.1 Cannabis1G15276.t1 Cannabis2G04722.t1 . jl_CM022967.1_001364.1 fujian_Chromosome_3_001075.1 hemp_hap_CM010794.2_000604.1 Cannabis1G15277.t1 Cannabis2G04721.t1 cs_NC_044372.1_000681.1 jl_CM022967.1_001365.1 fujian_Chromosome_3_001078.1 hemp_hap_CM010794.2_001842.1 Cannabis1G15276.t1 Cannabis2G04722.t1 . jl_CM022967.1_001364.1 fujian_Chromosome_3_001075.1 (jcvi) $ head pk_hap1_hap2_cs10_jl_fujian.bed AGQN03000001.1 20135 24161 hemp_hap_AGQN03000001.1_000001.1 0 + AGQN03000001.1 27452 30136 hemp_hap_AGQN03000001.1_000002.1 0 + AGQN03000001.1 36570 40848 hemp_hap_AGQN03000001.1_000003.1 0 + AGQN03000001.1 40868 53215 hemp_hap_AGQN03000001.1_000042.1 0 - AGQN03000001.1 67667 71608 hemp_hap_AGQN03000001.1_000004.1 0 + AGQN03000001.1 91617 93766 hemp_hap_AGQN03000001.1_000005.1 0 + AGQN03000001.1 99286 102883 hemp_hap_AGQN03000001.1_000041.1 0 - AGQN03000001.1 117364 121489 hemp_hap_AGQN03000001.1_000040.1 0 - AGQN03000001.1 154951 156694 hemp_hap_AGQN03000001.1_000006.1 0 + AGQN03000001.1 171831 174222 hemp_hap_AGQN03000001.1_000039.1 0 - (jcvi) $ cat 1blocks.layout

x, y, rotation, ha, va, ratio, label

0.5, 0.8, 0, left, center, red, 1, PK Chr04 0.3, 0.7, 0, left, center, pink, 1, hap1 Chr03 0.7, 0.7, 0, left, center, blue, 1, hap2 Chr03 0.5, 0.6, 0, left, center, green, 1, Cs10 Chr03 0.5, 0.5, 0, left, center, green, 1, JL Chr03 0.5, 0.4, 0, left, center, green, 1, Yushe Chr03

edge

e, 0, 1 e, 0, 2 e, 1, 3 e, 2, 3 e, 3, 4 e, 4, 5 the error was followed [07/16/24 09:26:36] DEBUG Load file base.py:36 pk_hap1_hap2_cs10_jl_fujian.bed
[07/16/24 09:26:38] DEBUG Load file aae1.blocks base.py:36 DEBUG Load file 1blocks.layout base.py:36 Traceback (most recent call last): File "/public/home/linfeifan/miniconda3/envs/jcvi/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/public/home/linfeifan/miniconda3/envs/jcvi/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/public/home/linfeifan/miniconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/graphics/synteny.py", line 737, in main(sys.argv[1:]) File "/public/home/linfeifan/miniconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/graphics/synteny.py", line 703, in main Synteny( File "/public/home/linfeifan/miniconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/graphics/synteny.py", line 465, in init ext = bf.get_extent(i, order) File "/public/home/linfeifan/miniconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/compara/synteny.py", line 75, in get_extent si, start = min(ocol) ValueError: min() arg is an empty sequence