Closed DSuveges closed 1 year ago
GWAS Catalog is working on integrating both REGENERON summary statistics and AstraZeneca PheWAS Portal data. This way we can delegate the phenotype mapping task to them.
At the moment they already have the associations for REGENERON here, so we have been able to generate and validate evidence. In this table, in the regeneron_gwascat_evidence_mapping
sheet, I have made an analysis of the quality of the mappings present in the final evidence.
https://docs.google.com/spreadsheets/d/16C6ScZBB8FjYjX_K5eGpPO6OcecwSShl5b2EMJWvdG4/edit?usp=sharing
Those mappings that I would change are marked with a FALSE in the review column. I have also classified their level of error by colour:
Of the 470 mappings, 31 have been marked FALSE, 16 blank, 9 yellow and 6 red. My impression is overall very good.
Note that we have a few terms which are recurrent and are not present in the EFO OTAR Slim, which will therefore turn into failed evidence. We should see if they are of any value.
As far as the AZ mappings go, I had a chat last Friday with Santhi, the curator in the GWAS Catalog responsible for these mappings and she has confirmed that they expect to have the whole set ready by the end of this week, so I expect to have AZ evidence integrated for 22.04.
Their rhythm is frankly incredible. This is an overview of their process:
GWAS Catalog has finished their curation of the AZ traits. They have provided us with their working Excel file so that we can integrate it while waiting for them to make them available through their platform. So the next step is to ingest them directly from their Downloads page as we do with REGENERON.
Location of the file: gs://otar000-evidence_input/GeneBurden/data_files/AZ_Traits.xlsx
We have already generated evidence from them, see #121
After a meeting with GWAS Catalog to discuss Gene burden’s mappings: Some particular mappings that I thought were not accurate were discussed and they will be updating them. As a reminder, we have 6k evidence from AZ that we are dropping due to unmapped disease, mostly metabolomics related traits. We are missing these because they haven’t processed all traits available in the PheWAS Portal, but the ones reported in the publication, which is based on a smaller sample. Therefore we will be able to recover them by 22.06 ^^
AZ's summary stats are about to be included in the GWAS Catalog - they are just waiting for some EFOs to be added.
Santhi has shared with us a spreadsheet where she found some inconsistencies between our mappings and their mappings. https://app.zenhub.com/files/143733948/0fe3ea85-aed3-41e4-bc3d-396438c8737c/download
In my opinion, these are few (20) and minor so that we shouldn't worry about it.
When these data is out we should: 1) Change the AZ parser so that we pull the mappings from the GWAS Catalog instead of our manual curation repository. 2) I'd remove the AZ traits from the manual curation table, however we can simply decide we want to keep them. 3) Make sure that the unmapped quantitative traits mentioned above are also included in the Catalog.
@ireneisdoomed can we close this?
I think so, I was waiting for the data to be available in GWASCatalog, but it hasn't happened.
TBC