qiao-xin / DupGen_finder

A pipeline used to identify different modes of duplicated gene pairs
87 stars 29 forks source link

the number of genes in 'xxx.xxx.pairs' was not consistent with the number in 'xxx.xxx.genes' #15

Open ruih-ruih opened 1 year ago

ruih-ruih commented 1 year ago

Dear Dr. Qiao,

Of the output files from DupGen_finder, I found that the number of genes in the file 'xxx.wgd.pairs' (after deduplication) is equal to the number in the file 'xxx.wgd.genes'.

But why the number of genes in the file 'xxx.proximal.pairs' (after deduplication) is more than the number in 'xxx.proximal.genes' (4051 vs. 3425)?

Can you give some explanations?

Thanks a lot.

qiao-xin commented 1 year ago

When generating .genes files, DupGen_finder avoid to output repetitive gene ID into different modes of .genes files. That is, if a gene ID has been outputted into the file Ath.wgd.genes, it will be not assigned to other modes of *.genes files such as Ath.tandem.genes or Ath.proximal.genes or … The priority of duplicated genes that were assigned is as follows: WGD > tandem > proximal > transposed > dispersed.

WGD-derived genes may occur single-gene duplication over long-term period of evolution, leading to the same gene being involved in two or more types of gene duplication. For example, a gene ID contained in the file Ath.wgd.pairs may be found in Ath.tandem.pairs or Ath.proximal.pairs or other .pairs files, but this gene ID (WGD-derived) will not be outputted into .tandem.pairs or *.proximal.pairs according to the priority of duplicated genes.

ruih-ruih commented 1 year ago

Thank you for quick reply. In other words, the number of genes will be same in two data sets (.genes-unique and .pairs-unique files ) generating from _DupGenfinder-unique mode? @qiao-xin

ColinR01 commented 2 months ago

@qiao-xin : I found that *pairs has a duplication phenomenon: image

ColinR01 commented 2 months ago

@qiao-xin : image

qiao-xin commented 2 months ago

I am not sure whether the duplication phenomenon has occurred in *blast file. Please check.