gtonkinhill / panaroo

An updated pipeline for pangenome investigation
MIT License
264 stars 33 forks source link

merge pangenome graphs for different species #175

Closed joglekarp closed 2 years ago

joglekarp commented 2 years ago

Hi

I am interested in merging individual pan-genomes generated for three species (with distinct ANI) within a genus. Is there a recommendation about whether it is okay to split paralogs in order to run panaroo-merge successfully?

Thanks

gtonkinhill commented 2 years ago

Hi,

It should be okay to split paralogs. One thing to note when merging graphs from different species is that Panaroo's ability to resolve very diverse gene clusters relies on gene synteny. Thus, if the gene synteny differs substantially between your species, some of the more diverse gene families may not be resolved.

joglekarp commented 2 years ago

Hi,

Thank you. This is very helpful. Just to be sure that as a novice I understand this process, merge graph is based on gene synteny and not sequence identity? Also can the --search_radius parameter be used during merging to resolve some of the "more diverse gene families" that you warn about? thx

gtonkinhill commented 2 years ago

Hi,

Not quite. It uses both synteny and sequence identity but very diverse genes that appear in very different locations in the two species may not be clustered together.

Adjusting the --search_radius might help a little bit but making it too large increases the chance that genes may be incorrectly clustered together. A better option might be to relax the initial clustering threshold using --threshold option.

I would suggest running the merge using the default settings initially and then investigating the resulting graph in Cytoscape to see if it looks reasonable. You could then test how sensitive the results were to changes in the parameter settings.

joglekarp commented 2 years ago

Thank you