qlu-lab / POP-TOOLS

Valid and Powerful Machine-Learning-Assisted Genetic Association Studies
GNU General Public License v3.0
12 stars 2 forks source link

Pop-Gwas not working #1

Closed lisaeick closed 5 months ago

lisaeick commented 7 months ago

Dear Pop-GWAS team I tried your method in Finngen for a binary trait (CHD) which we do predict and making a continous GWAS using the probability values which the model outputs. The PopGWAS output is very inflated and very noisy. We exchanged the LDSC reference file to a finngen specific one and the inflation got better but the noisyness remains. Any ideas what went wrong? POP_GWAS_BIN_EUR

jmiao24 commented 7 months ago

Hi lisaeick,

Thank you for your interest in POP-GWAS and for trying it out!

There are two scenarios where the current POP-GWAS may produce noisy results:

  1. When there is sample overlap (or overlap of related samples) between labeled data and unlabeled data.
  2. There is selection bias for the sample that are labeled (i.e., have observed CHD) compared with unlabeled data. This will make gwas-yhat-unlab to be different from gwas-yhat-lab.

Can you run the following sensitivity analysis to verify this?

It would be great if you could also share the summary statistics with us via e-mail, so we could reproduce your results and help in investigating the issue further. However, we understand if sharing this data is not feasible.

Best, Jiacheng

lisaeick commented 7 months ago

Hey Jiacheng,

Thanks for your fast reply and sorry for needing much time to answer, but with our download boundaries and easter it took me some time to get the sumstats.

If you know a way to share them (since they are big) let me know. (2.2gb)

I prepared the data exclusively for popgwas and split them randomly in labeled and unlabeled so there is no selection bias. Of course there is relation within the samples, since its one cohort, but also we were sure to avoid sample overlap.

I downloaded the pure sumstats and attached the rsids, to avoid that one of my preprocessing steps might be the error. As said please let me know how you would receive them and I am happy to share the sumstats.

Best Greetings Lisa


From: Jiacheng Miao @.> Sent: 26 March 2024 16:03 To: qlu-lab/POP-TOOLS @.> Cc: Eick, Lisa @.>; Author @.> Subject: Re: [qlu-lab/POP-TOOLS] Pop-Gwas not working (Issue #1)

Hi lisaeick,

Thank you for your interest in POP-GWAS and for trying it out!

There are two scenarios where the current POP-GWAS may produce noisy results:

  1. When there is sample overlap (or overlap of related samples) between labeled data and unlabeled data.
  2. There is selection bias for the sample that are labeled (i.e., have observed CHD) compared with unlabeled data. This will make gwas-yhat-unlab to be different from gwas-yhat-lab.

Can you run the following sensitivity analysis to verify this?

It would be great if you could also share the summary statistics with us, so we could reproduce your results and help in investigating the issue further. However, we understand if sharing this data is not feasible.

Best, Jiacheng

— Reply to this email directly, view it on GitHubhttps://github.com/qlu-lab/POP-TOOLS/issues/1#issuecomment-2020514605, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQ45SVGSMBHUJSIXAS2N3R3Y2F52NAVCNFSM6AAAAABFIVQY5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGUYTINRQGU. You are receiving this because you authored the thread.Message ID: @.***>

lisaeick commented 7 months ago

Hey Jiacheng,

Thanks for your fast reply and sorry for needing much time to answer, but with our download boundaries and easter it took me some time to get the sumstats.

If you know a way to share them (since they are big) let me know. (2.2gb)

I prepared the data exclusively for popgwas and split them randomly in labeled and unlabeled so there is no selection bias. Of course there is relation within the samples, since its one cohort, but also we were sure to avoid sample overlap.

I downloaded the pure sumstats and attached the rsids, to avoid that one of my preprocessing steps might be the error. As said please let me know how you would receive them and I am happy to share the sumstats.

Best Greetings Lisa

jmiao24 commented 7 months ago

Hi Lisa,

Thank you for your email and for preparing the summary statistics.

Regarding the sharing of the large dataset, we can consider using cloud storage services Google Drive, Dropbox or OneDrive or Box. you can upload the data to a shared folder and provide me with a link to access it directly through e-mail (not through the Github issue). After that, I will look into POP-GWAS.

Best, Jiacheng

lisaeick commented 7 months ago

Hi Jiacheng,

Can you please provide an email, other then this github thread so I can share the one Drive link?

Best Lisa


From: Jiacheng Miao @.> Sent: 08 April 2024 16:49 To: qlu-lab/POP-TOOLS @.> Cc: Eick, Lisa @.>; Author @.> Subject: Re: [qlu-lab/POP-TOOLS] Pop-Gwas not working (Issue #1)

Hi Lisa,

Thank you for your email and for preparing the summary statistics.

Regarding the sharing of the large dataset, we can consider using cloud storage services Google Drive, Dropbox or OneDrive or Box. you can upload the data to a shared folder and provide me with a link to access it. After that, I will look into POP-GWAS.

Best, Jiacheng

On Apr 8, 2024, at 7:04 AM, lisaeick @.***> wrote:

Hey Jiacheng,

Thanks for your fast reply and sorry for needing much time to answer, but with our download boundaries and easter it took me some time to get the sumstats.

If you know a way to share them (since they are big) let me know. (2.2gb)

I prepared the data exclusively for popgwas and split them randomly in labeled and unlabeled so there is no selection bias. Of course there is relation within the samples, since its one cohort, but also we were sure to avoid sample overlap.

I downloaded the pure sumstats and attached the rsids, to avoid that one of my preprocessing steps might be the error. As said please let me know how you would receive them and I am happy to share the sumstats.

Best Greetings Lisa

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/qlu-lab/POP-TOOLS/issues/1*issuecomment-2042571410__;Iw!!Mak6IKo!I9-PiyePHuCQXCj-_MRLfhIVxruIqGAjtoIb7uegoOAHo1xH-_XY_2qidIZlv3LmWv6PeWOQFPCBGJIQK2FhAsYRAg$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ANLWM7GMSNG4E6MJTUXB2DTY4KBWTAVCNFSM6AAAAABFIVQY5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGU3TCNBRGA__;!!Mak6IKo!I9-PiyePHuCQXCj-_MRLfhIVxruIqGAjtoIb7uegoOAHo1xH-_XY_2qidIZlv3LmWv6PeWOQFPCBGJIQK2ElVWgEcA$. You are receiving this because you were assigned.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/qlu-lab/POP-TOOLS/issues/1#issuecomment-2042812422, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQ45SVDCCRN2QNG6YXSS6XTY4KN7JAVCNFSM6AAAAABFIVQY5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSHAYTENBSGI. You are receiving this because you authored the thread.Message ID: @.***>

jmiao24 commented 7 months ago

My e-mail is jiacheng.miao@wisc.edu

Best, Jiacheng

lisaeick commented 7 months ago

hey Jiacheng, I send you a share message via OneDrive. Please let me know if there are problems occuring. Thanks for your fast and friendly replies and best greetings Lisa


From: Jiacheng Miao @.> Sent: 08 April 2024 17:19 To: qlu-lab/POP-TOOLS @.> Cc: Eick, Lisa @.>; Author @.> Subject: Re: [qlu-lab/POP-TOOLS] Pop-Gwas not working (Issue #1)

My e-mail is @.**@.>

Best, Jiacheng

— Reply to this email directly, view it on GitHubhttps://github.com/qlu-lab/POP-TOOLS/issues/1#issuecomment-2042881165, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQ45SVEZKXAKAUPGGNYAA6DY4KRNZAVCNFSM6AAAAABFIVQY5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSHA4DCMJWGU. You are receiving this because you authored the thread.Message ID: @.***>

jmiao24 commented 6 months ago

Hi Lisa,

Thank you for your patience. I have made two updates to POP-GWAS to resolve this issue:

  1. removed SNPs with duplicate IDs.
  2. added a flag in POP-GWAS to use sample overlap correction.

For 1, the original GWAS statistical summary had many SNPs with duplicate IDs, which would have messed up the calculations.

For 2, although there are no overlapping individuals in the labeled and unlabeled data, the non-zero intercept (0.15) of the bivariate LDSC between input GWAS indicates that there is a residual correlation, and the GWAS performed in these two samples cannot be considered truly independent. I have added a version to address this issue.

All you need to do is add --sample-overlap to the POP-GWAS script. You may also need to update your POP-GWAS dependencies to the latest version.

A Manhattan plot (without MAF cutoff) using the updated POP-GWAS is attached: Manhattan_no_qc

If the MAF > 0.01 cutoff is applied, the Manhattan plot is Manhattan_with_qc

I have also emailed you the scripts to reproduce my results. Thank you for identifying the issues.

Best, Jiacheng