jtlovell / GENESPACE

Other
191 stars 27 forks source link

Understanding non-synteic ortholog #112

Closed ericgonzalezs closed 1 year ago

ericgonzalezs commented 1 year ago

Hi,

I generated the pangenome table with the function query_pangenes. I am using the version v1.2.3.

The header of the table is: pgID interpChr interpOrd og repGene genome chr start end genome1 genome2 ...

I noticed that the same gene can be multiple times in the table as a non-syntenic ortholog. I also noticed that the og number can be different for the same gene appearing multiple times as a non-syntenic ortholog. It is normal to have the same gene in multiple orthogroups?

I also noticed that some rows are duplicated in the pangenome table. For example, I can have something like this:

1516 Chr1 324 79778 g00016731 genome1 Chr1 94757491 94760892 g00016731 6378 Chr1 1296.56 79778 g00016731 genome1 Chr1 94757491 94760892 g00016731

In this example, we have the same gene and all the info is the same in both rows, just the pgID and interpOrd are different. Do you know why this is happening?

Many thanks,

Eric

jtlovell commented 1 year ago

Hi Eric, So, just to put this in fixed width:

1516 Chr1 324 79778 g00016731 genome1 Chr1 94757491 94760892 g00016731
6378 Chr1 1296.56 79778 g00016731 genome1 Chr1 94757491 94760892 g00016731

What happened here is likely not a bug ... apparently this genome has two syntenic positions to itself (was there a wgd or maybe an assembly problem) on the left and right arm of this chromosome (chr1). Do the dotplots reflect this? If not, then maybe we should troubleshoot.

John