jtlovell / GENESPACE

Other
191 stars 27 forks source link

Riparian plot for synteny computed elsewhere #19

Closed brunocontrerasmoreira closed 2 years ago

brunocontrerasmoreira commented 2 years ago

Hi @jtlovell , I wonder whether it would be possible to generate a riparian plot from synteny data computed with other software. If so, is there any documentation or sample data file that we can use as a guide? Thanks for your help, Bruno

jtlovell commented 2 years ago

It is not currently possible. However, I could make it generalizable to use pairwise blast hits with a vector stating the syntenic block of each hit. Would that be helpful? That said, if you already have pairwise blast hits, you might as well just run GENESPACE. What software are you thinking?

brunocontrerasmoreira commented 2 years ago

Hi, I was thinking of input in PAF format, as in the R package https://cran.r-project.org/package=pafr

jtlovell commented 2 years ago

I see - just the block coordinates. Short answer: no, not in its current form. In order to color the braids by a reference chromosome (and allow you to zoom in on specific regions), GENESPACE uses the underlying blast hits against that reference genome. So you need both the coordinates and the hits. However, i have been playing with parsing PAF/bedmap files so that the known positions of blocks against a reference are used instead of the hits themselves. I could see having a simplified riparian plotter from just a block coordinate file in a release later this year. I'd also be open to mocking something up for a specific dataset you are working on. DM me on twitter or email. John

dirkjanvw commented 9 months ago

Hi! I'm very interested in this too. I have been searching through the code but I haven't found anything, what is the current status for using own synteny data for generating a riparian plot?

xiekunwhy commented 7 months ago

hope that all plot function in GENESPACE can be used easy and independently, users can make synteny analysis and get data more efficiency themself.

jtlovell commented 7 months ago

To make a riparian plot, you need more than just a paf ... you need all of the blast results phased against a reference genome (so that the ribbons can be calculated and colored). AFAIK, only GENESPACE gives you this, so feeding in an outside format wouldn't be useful. Eventually there will be a multi-genome paf plotter, but this won't have the region or chomosome tracking facilities of the riparian plot

xiekunwhy commented 7 months ago

Thanks, GENESPACE's synteny ribbons curve are really beautiful, I will dig out the relate functions from GENESPACE and write some simple code to plot synteny if I have time in the near future. Synteny plotting don't need such a heavy package.

dirkjanvw commented 7 months ago

Aha, thanks for the reply! It would be very interesting to have a sandalone script/tool that can make the beautiful Riparian plots, happy to test it once it's there.

Also, I don't think I fully understand what GENESPACE does when you say "you need more than just a paf": Suppose I have blocks (from MCScan(X), ntSynt or some custom code) where I know for each block what its position is on each genome (chromosome:start-end); or the reverse: for each position to what block it belongs. What extra information would GENESPACE need that I don't have?

jtlovell commented 7 months ago

@xiekunwhy good luck. Check out the GENESPACE paper, and you'll see why you can't use simple tools like MCScanX alone to find syntenic blocks. @dirkjanvw The syntenic blocks need to be phased by a reference genome to track intervals (by default whole chromosomes). This phasing problem is not simple in many situations (although it is if all chromosomes are named the same and orthologous). Either way, MCScanX is great to find collinear hits but its syntenic blocks cannot be trusted, except in unique single-copy regions.