Questions about details for reproduction

HelloWorldLTY commented 3 weeks ago

Hi, thanks for your great work. I am going to reproduce this project in my own computer, but it seems that I do not have enought information of 1. version of packages used in this project, like for torch_geometric, is it based on pyg 2.0? and 2. For the prs score of each cell, do we have an example dataset for format checking? Thanks.

szhang1112 commented 2 weeks ago

Thanks for your interest! We tested our model on pyg 1.7.2 and torch 1.9.0. I have updated package versions in README. Re your 2nd question, given the restriction of UKBB individual-level data, it is difficult to share them publicly. We strictly followed steps and data format in this tutorial for computing PRS. Let me know if you have other questions.

HelloWorldLTY commented 2 weeks ago

Thanks a lot. I understand the rule of UKBB usage. Since I do not know the setting of patient partition you used for training/testing, I believe it will be a bit hard to reproduce them.

Moreover, I have a further question about the performances of baselines in Figure 3. It seems that C+T performs better than other methods, including more advanced method LDPred2. Do you think it is caused by the optimization process of baseline models, or are there other reasons to explain the difference? Thanks.

szhang1112 commented 2 weeks ago

Yes it is not easy - the sample IDs are even different across different UKB applications. Probably I could consider to generate a synthetic dataset for demo in the future.C+T worked better than other PRS in 2 out of 3 datasets and in the other case (HCM) LDpred worked better. We optimized C+T hyperparameters e.g. Pvalue and R2 cutoffs on the validation set to choose the best model, meaning other C+T could work very bad.Let me know if you have other questions.On Aug 27, 2024, at 19:01, HelloWorldLTY @.***> wrote: Thanks a lot. I understand the rule of UKBB usage. Since I do not know the setting of patient partition you used for training/testing, I believe it will be a bit hard to reproduce them. Moreover, I have a further question about the performances of baselines in Figure 3. It seems that C+T performs better than other methods, including more advanced method LDPred2. Do you think it is caused by the optimization process of baseline models, or there are other reasons to explain the difference. Thanks.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

szhang1112 / scPRS

Questions about details for reproduction #1