gymrek-lab / TRTools

Toolkit for genome-wide analysis of tandem repeats
https://trtools.readthedocs.io/
MIT License
50 stars 20 forks source link

Logistic regression in AssociaTR #178

Open asuresh2 opened 1 year ago

asuresh2 commented 1 year ago

Hello!

Is there a plan for binary outcome/logistic regression support for AssociaTR?

Thanks

maegsul commented 1 month ago

Hi, I also wanted to ask this question - whether there are any plans soon to implement binary phenotypes for associaTR. It would be a really great addition to associaTR!

Thanks!

gymreklab commented 1 month ago

Hi @maegsul, associaTR currently does not support logistic regression. However, we have recently added the annotaTR tool: https://github.com/gymrek-lab/TRTools/tree/master/trtools/annotaTR which can be used to convert TR genotype files to pgen, which you can then use as input to plink for association testing. This should allow you to do both linear or logistic regression.

maegsul commented 1 month ago

Thanks a lot @gymreklab! I have actually noticed earlier this week that my another issue #209 was marked as completed with the release of this new annotaTR tool, and since then I have been testing it. It works mostly fine for me - I am only having some plink/pgen-related errors, maybe better to mention these under issue #209 as it would be more relevant to discuss those there as a follow-up.

If plink-related problems would be solved, nevertheless I still do think that having a logistic regression mode in associaTR would be really great for several reasons. I think having the repeat length units included in the association testing vs 0-2 scale limitation of plink is a great upgrade already, it is also as efficient as plink and it could be even faster because we won't need to convert to pgen first. Also, plink2 is still in development and we came across with multiple inconsistencies in association testing in some releases of it, meanwhile associaTR linear regression results are fully matching to what I obtained via R's glm-based results, for instance. Finally, I also find the output of associaTR's TR-specific output very informative and useful.

I understand it might be not a priority to implement logistic regression in associaTR, but I just wanted to indicate my (and likely others') interest in it. If I have time (and required expertise of course!) I can also give it a try to come up with an implementation suggestion - in this case I will let you know as well.

Thanks once again for developing TRTools! :)

gymreklab commented 1 month ago

Hi @maegsul, thanks for testing this out! I saw you also just submitted a post on #209 so I will take a look at that shortly.

Note as long as TR-trait associations are treated as linear the 0-2 scale limitation shouldn't actually change the p-values of the association tests.

That being said, I agree it would be great to have this functionality in associaTR as well. This is definitely on our TODO list, but might take a bit to get that implemented since we are adding some other new features to TRTools first which is why we are suggesting plink2 in the mean time. I'll update this issue once we have updates on this.

maegsul commented 1 month ago

Great, thanks @gymreklab!

Indeed, 0-2 scaling would not impact the p-values, but one interested in reporting the effect with respect to per-unit increase in repeat length would need to convert the beta obtained from 0-2 scaling association testing back again; otherwise it should be all the same as you indicated!

Great to hear also that you plan to implement logistic regression in the future, and thank you once again!