nclark-lab / RERconverge

Analysis of convergence between organismal traits and DNA/protein sequences
GNU General Public License v3.0
44 stars 26 forks source link

Can RERconverge handle gene trees with duplicates? #64

Open ghost opened 2 years ago

ghost commented 2 years ago

Hi, I'm working with a dataset where many genes have duplicates. I'm feeding in a master species tree to serve as the 'background', but it seems like the RERconverge expects to only have one gene per species. Can you let me know if RERconverge can handle duplicates? Or does it expect to have duplicates represented in the master tree? If so, can you recommend a potential work around for this?

nclark-lab commented 2 years ago

Hello, RERconverge can only handle orthologous sets of sequences. It expects only on gene per species. Recent duplicates cannot currently be used. A potential work around is to prune out paralogous sequences if the number of offending alignments is small, but it can be hard to scale genome-wide. The most conservative move is to not analyze gene families with duplication events within your tree, or at least those species with the duplicate, but that can reduce power. It's always a judgement call, and it depends on your final goal. If you're screening then it's ok to be more lax. If you want to make final conclusions from the data, it's best to prune.

ghost commented 2 years ago

Thank you for clarifying that Nathan!

All the best,

Maggie

On Tue, Nov 23, 2021 at 7:16 PM Nathan Clark @.***> wrote:

Hello, RERconverge can only handle orthologous sets of sequences. It expects only on gene per species. Recent duplicates cannot currently be used. A potential work around is to prune out paralogous sequences if the number of offending alignments is small, but it can be hard to scale genome-wide. The most conservative move is to not analyze gene families with duplication events within your tree, or at least those species with the duplicate, but that can reduce power. It's always a judgement call, and it depends on your final goal. If you're screening then it's ok to be more lax. If you want to make final conclusions from the data, it's best to prune.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nclark-lab/RERconverge/issues/64#issuecomment-977310224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWTV6OIHY4Y6KMFXYS37PYDUNQVEPANCNFSM5IRK5R3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.