rhysf / Synima

Synteny Imager
MIT License
59 stars 8 forks source link

Questions regarding the synteny plots from the same data #33

Open mudithekanayake opened 2 years ago

mudithekanayake commented 2 years ago

Hi @rhysf,

First of all, thank you very much for this awesome tool. I have some results from the Synteny output which is confusing for me. I would be grateful to you if you can answer them. I have attached 4 synteny plots. Plot 1 shows the synteny between two species (Ndig and Emu). Plot 2 shows the synteny between two different assemblies from Ndig (Flye and Nextdenovo) and Emu. In the plot 1 Ndig is Flye in the plot 2 (Plot 1 Ndig is equal to Plot 2 Flye). In the plot 3, I have changed the order of the species (Flye in the middle). Plot 4 just showing the synteny between Flye and Nextdenovo (Two genomes for the same species from two different assemblers).

Question1: Plot 1 shows the synteny between Flye (Ndig) and Emu and I am seeing lot of synteny in between (lot of lines), but when I add another Ndig genome which is Nextdenovo without changing other data, I do not see much synteny in between Flye and Emu. Number of lines are significantly different. I was wondering what is the reason for this observation?

Question2: In the plot 3 as well as in the plot 4, it is showing the synteny between the two genomes from two different assemblers for the same species. Basically they should be vary similar to each other. But, do you know why we cannot see much synteny in beween these two genomes which are supposed to be similar?

Question3: Although the plot 3 and plot 4 both show synteny between flye and nextdenovo, they seem to be different (Number of lines and the patterns of lines). Is it because there is an effect from the 3rd genome (Emu) in the 2nd plot?

Question4: In the plot 1 as well as in the plot 2, I can see the contigs in the Emu are ordered, but in all other species, the contigs are not ordered. Do you know the reason for this? Is there a way to order the other species as well? (I was wondering whether it is because the genome is ordered according to the size. But when I get my initial plot for Ndig and Emu with the default parameters the contigs were not ordered in Emu. So I think it cannot be it).

I am really sorry for asking too many questions. I am new to synteny analysis and I am still learning. So, I would be grateful to you if you can help me to solve these questions. Thank you.

config.txt1.pdf config.txt2.pdf config.txt3.pdf config.txt4.pdf

rhysf commented 2 years ago

Question1: Synima only shows synteny between pairs of genome assemblies. Therefore, if you change the order, or add different assemblies, you will plot different evidence of synteny. If you want all of the syntenic information of DAGchainer, then you could plot them all pairwise.

Question2: Synima identifies synteny based on the gene annotation that is passed to it (assuming you have used the orthology prediction pipeline included, and in the way suggested). If you are finding low levels of synteny, this may be due to poor annotation quality. You can also try running DAGchainer with reduced threshold i.e. 2 or 3 chains of orthologs.

Question3: Yes, see answer to Q1.

Question4: You can determine the order of the contigs manually from the command line or better still, the config file. You can also order the species in either way too. The genome assembly is not ordered according to length.

Synima orders the contigs by default (otherwise the plots would be a lot less clear) - and this starts from the bottom genome assembly up. However, you can control this default behavior by specifying the order of the species and the contigs - most easily done in the config file.

mudithekanayake commented 2 years ago

Hi @rhysf Thank you very much for the explanations. I get two different results for the same genomes in two occasions. Since it gives the pairwise synteny shouldn't it give the similar outputs in the both occasions? Also is there an flag or an option in synima that I can use for ordering the contigs?

rhysf commented 2 years ago

What do you mean by "I get two different results for the same genomes in two occasions.". What have you changed / mean by occasions? Have I not already answered this above?

And yes, in the config file there are 'contigorder' that will state the order decided by Synima, and you can change that if you wish.

mudithekanayake commented 2 years ago

Hi @rhysf

I meant the occasion I mentioned earlier which I saw different synteny for the same set of genomes. in confix.txt1 synteny between Emu and Ndig is different from the synteny between Emu and Flye in config.txt2 (Ndig and Flye are the same genome).

However today I redid the synteny analysis for Emu, Flye and Nextdenovo. Following are the two plots I got.

config.txt3.pdf config.txt4.pdf

Data is the same for two plots. Only difference is, in config.txt4, I rearranged the Flye and Nextdenovo genomes according to the contig lengths before running the synteny analysis. Now the synteny look completely opposite of what it was in config.txt3. I would be grateful If you can help me to figure this out.

rhysf commented 2 years ago

Hi,

The files you uploaded previously (config1 and config2) have 2 genomes and 3 genomes respectively. Everything that Synima is plotting is specified in the align.coords and .coords files. If you have different genomes, or different genome order, then the plot is going to look different.

Now the synteny look completely opposite of what it was in config.txt3.

Yes, that is unusual. Are you sure that the align.coords and .coords file are the same for both runs (/ given the same location in the config file)? Are you also sure that you have sufficient memory and Synima.pl is running till completion without error both times?

If they are, then i may need to try and recreate the plots to see what is going on, as changing the contig order should not change the synteny plotted. If you think this is the case and are happy to share your data with me, then you can email me each of the genome.fasta files, aligncoords and aligncoords.spans to:

r dot farrer at broadinstitute dot org

I think that should be all i need. It would be helpful to also have the 2 config files that have generated your different figures?

Best, Rhys

mudithekanayake commented 2 years ago

Hello @rhysf

I have done the two runs separately first time using the unordered Flye and Nextdenovo genomes and the second time using the ordered Flye and Nextdenovo genomes. I am running all my analysis on the university high performance clusters (HPC), so the memory should be sufficient. Synima ran without any errors both times.

I am happy to share my data and I will email the files. Thank you very much for your kind help. Really appreciate it.

mudithekanayake commented 2 years ago

Hello @rhysf

It says address not found when I try to send an email. Is the email address correct?

rhysf commented 2 years ago

Hi @mudithekanayake

Yes it is. If it's not working, you can share your email address and i'll email you.

mudithekanayake commented 2 years ago

Hello @rhysf

Yeah, sure. This is my email. mudith@iastate.edu