tanghaibao / jcvi

Python library to facilitate genome assembly, annotation, and comparative genomics
BSD 2-Clause "Simplified" License
756 stars 186 forks source link

about outparalogs #370

Closed mo716 closed 3 years ago

mo716 commented 3 years ago

Hi! I hope you are doing well! I really like the MCScan pipeline made for python (https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)) that you have developed. I would however like to ask:

Are outparalogs automatically excluded from the orthologs identified by MSCan?

My question arises from the fact that some genes could have duplicated before an speciation event. I am working with Brassica napus; hence, my interest on knowing more about how orthologs are defined in the MCScan python pipeline. Thank you very much for any answer in advance :)

tanghaibao commented 3 years ago

@mo716

This is controlled by the magic option --cscore.

The wiki already has an example for grape vs. peach -- when cscore is close to 1 (i.e. 0.99), we find pretty much direct orthologs (split at the time of species divergence). As you relax the cscore, more outparalogs are allowed. The default value is 0.7 which allowed for some recent outparalogs (with scores at least 70% compared to direct orthologs). Occasionally, I relax to 0.5 to look for really ancient events.

mo716 commented 3 years ago

HI @tanghaibao! Thank you very much for the reply. As I understand now, one can use the c-score to filter the the likelihood of orthologs being outparalogs or not. Plus, the tip about the 0.5 c-score for ancient events seems very helpful as well. Thank you very much! :)