zhangrengang / SOI

Robust identification of orthologous synteny with the Orthology Index
GNU General Public License v3.0
10 stars 1 forks source link

A request for a new feature #1

Open shengxinzhuan opened 1 week ago

shengxinzhuan commented 1 week ago

hello, rengang! This is an excellent tool that provides a very convenient command line to implement what used to be such a complex process. However, during my use, I found that the functionality it provides only supports outputing a binary tree result for a species, rather than offering a subgenome tree result similar to wgdi to explain phenomena related to reticulate evolution. Within the collinear block information provided by wgdi, there should be relationship such as 1:2 or 1:1. Could you provide a feature that integrates this information and then outputs the results of subgenome block tree construction? This would be a very useful feature for studying ancient hybrid-polyploidizaiton. Although wgdi offers a similar function, the reliance on manual operation leads to very low efficiency and can result in discrepancies between results due to subjective judgments. If an automated feature could be implemented in orthoindex, that would be very exciting! Wishing you all the best ! Alfred Hou

zhangrengang commented 1 week ago

Thanks for your request. It is a good idea, but sorry I have no idea about how to integrate multiple lines of evidence (synteny, orthology and phylogeny) to phase subgenomes in an automated way at present. Now I can only output macro-synteny (but not subgenome-scale) phylogeny (used in here and here) in an automated way. If you request, I can release the code. But it is also not suitable to many conditions.

Anyway, orthoindex can contribute to subgenome phasing with wgdi in a more effient way:

  1. Orthoindex shows more clear orthology relationshipes than Ks for subgenome grouping (see here).
  2. Orthoindex can robustly identify orthologous syntenic blocks to remove outparalogous synteny (see here). Its input (for soi filter) is the output of wgdi -icl, and its output keep the same format as the input. Thus, it is easy to be integrated into the wgdi pipeline.

Alternatively, if your genome is an neo-allo-poly-ploid, Subphaser can be used in an automated way.

shengxinzhuan commented 1 week ago

Thanks, rengang! Even if it's just achieving results based on macro-synteny, that would be an exciting outcome. This way, only further manual refinement would be needed to obtain the results I'm expecting, thus reducing the impact of human manipulation. Could you please release this code when you have a moment? Many thanks.

zhangrengang commented 1 week ago

Please find the usage here. Note that the code is not fully tested by other users, so there may be some errors when you run it at first. Just report the errors to me.

shengxinzhuan commented 1 week ago

Okay, I will take it to test my dataset and will let you know if there are any errors. Thanks!