reimandlab / ActivePathways

Integrative pathway enrichment analysis of multivariate omics data
101 stars 23 forks source link

Update statistical_tests.r #19

Closed MEladawi closed 1 year ago

MEladawi commented 1 year ago

corrected the hypergeometric test statistics per the following sources: https://seqqc.wordpress.com/2019/07/25/how-to-use-phyper-in-r/ https://montilab.github.io/BS831/articles/docs/HyperEnrichment.html

reimand0 commented 1 year ago

Hi Mahmoud,

Thank you so much for considering our code. You point out a more elegant solution that does not involve manually building association tables. However, I believe that there is a flaw in your code and our original code is still valid.

Namely, the below is incorrect. k = length(genelist)

It should be the following, because we take a sublist from the top of the gene list to determine the enrichment at each loop of the ranked HG test (i.e., not the entire gene list). k = length(genelist[1:which.in[i]])

This is the same as k = which.in[i]

When implementing this change on top of your code, it looks like our initial code provides the same P-values for enrichment, and an additional implementation with fisher.test() also provides the same P-values.

I hope this makes sense. Thanks again. Please let me know if you have any questions or comments here.

all the best, Jüri

On Thu, 10 Aug 2023 at 12:14, Mahmoud Eladawi @.***> wrote:

corrected the hypergeometric test statistics per the following sources: https://seqqc.wordpress.com/2019/07/25/how-to-use-phyper-in-r/ https://montilab.github.io/BS831/articles/docs/HyperEnrichment.html

You can view, comment on, or merge this pull request online at:

https://github.com/reimandlab/ActivePathways/pull/19 Commit Summary

File Changes

(1 file https://github.com/reimandlab/ActivePathways/pull/19/files)

Patch Links:

— Reply to this email directly, view it on GitHub https://github.com/reimandlab/ActivePathways/pull/19, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETAF7MSFQSOPOO763CSQGTXUUCGPANCNFSM6AAAAAA3LXUHYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

MEladawi commented 1 year ago

Hello, Jüri:

Thank you so much for elaboration. Correct, apologies for this, was working on multiple versions and mixed the k value in both.

I agree that they both produce the same p.value, but this is for the first list only. For subsequent lists, the count matrix sometimes becomes negative. That was the reason I wanted to simplify the code to this version.

Thank you, Mahmoud

reimand0 commented 1 year ago

Thanks, very interesting. Could you provide a small example with these negative counts?

On Thu, 10 Aug 2023 at 14:20, Mahmoud Eladawi @.***> wrote:

Hello, Jüri:

Thank you so much for elaboration. Correct, apologies for this, was working on multiple versions and mixed the k value in both.

I agree that they both produce the same p.value, but this is for the first list only. For subsequent lists. the count matrix sometimes becomes negative. That was the reason I wanted to simplify the code to this version.

Thank you, Mahmoud

— Reply to this email directly, view it on GitHub https://github.com/reimandlab/ActivePathways/pull/19#issuecomment-1673694954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETAF7JH3CA46GIX3FYXNVDXUUQ7LANCNFSM6AAAAAA3LXUHYQ . You are receiving this because you commented.Message ID: @.***>

MEladawi commented 1 year ago

orderedHypergeometric_params.csv Sure, please run your code with the parameters in the attached csv file.

MEladawi commented 1 year ago

Hmm.. I think the issue happens when a gene in the annotaions does not exist in the background. I reverted to the original code and added a line to fix this.

reimand0 commented 1 year ago

yes, I thought so too but did not get a chance to test yet. These operations with the background are done upstream elsewhere in the package.

Actually, I find the most intuitive way to run this test as follows.

assoc_table = table(background %in% genelist[1:which.in[i]], background %in% annotations) scores[i] <- fisher.test(assoc_table, alt = "g")$p.value

I will test it a bit more and include it in a next version of the package as it makes the code clearer overall.

On Thu, 10 Aug 2023 at 14:49, Mahmoud Eladawi @.***> wrote:

Hmm.. I think the issue happens when a gene in the annotaions does not exist in the background. I reverted to the original code and added a line to fix this.

— Reply to this email directly, view it on GitHub https://github.com/reimandlab/ActivePathways/pull/19#issuecomment-1673736138, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETAF7NODS2E3T6J5P3YW6LXUUUKJANCNFSM6AAAAAA3LXUHYQ . You are receiving this because you commented.Message ID: @.***>

reimand0 commented 1 year ago

Thank you for contributing!

On Thu, 10 Aug 2023 at 15:02, Mahmoud Eladawi @.***> wrote:

Closed #19 https://github.com/reimandlab/ActivePathways/pull/19.

— Reply to this email directly, view it on GitHub https://github.com/reimandlab/ActivePathways/pull/19#event-10062936725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETAF7LAAPTCP6KMGCLAA23XUUV2TANCNFSM6AAAAAA3LXUHYQ . You are receiving this because you commented.Message ID: @.***>

MEladawi commented 1 year ago

You are welcome!