monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

G2D Data Source Analysis #707

Open sagehrke opened 12 months ago

sagehrke commented 12 months ago

A detailed Gene to Disease Monarch Ingest data source analysis is needed. @madanucd please enter the information here that you are working on.

madanucd commented 11 months ago

image

madanucd commented 7 months ago

For the latest developments in G2D analysis, please refer to the following link: G2D Analysis Update

Approximately 1400 G2D associations from GENCC have been excluded from Monarch/HPOA. These exclusions stem from diverse sources that have not been accounted for within the HPOA or Monarch databases.

madanucd commented 6 months ago

Please find the latest updates regarding G2D data source analysis: • The notebook dedicated to G2D data source analysis, which parses 5 data sources and generates visualizations, is available at the following link: Notebook Link. • The slides from previous presentation have been recently updated to include an UpSet Plot, providing a comprehensive view of the data. Additionally, the list of excluded G2D associations from GENCC can be accessed through the following link: Excluded Associations from GENCC.

sagehrke commented 6 months ago

@julesjacobsen @cmungall @kevinschaper @monicacecilia FYI 👀 ⬆️

kevinschaper commented 6 months ago

I'm hoping we can get to a consensus/context for what to do next with our GENCC analysis. Given that we want one source of truth for G2D within Monarch, I'm curious if we're considering eventually bringing GENCC into the HPOA pipeline?

cc:@iimpulse

iimpulse commented 6 months ago

Lets figure out what the right approach is. It will be trivial to bring some/all of these guys in. Can someone set up a meeting to review gencc and the level of filtering we want to use.

julesjacobsen commented 5 months ago

One likely blocker issue is that the GenCC file appears to be created from submissions from the data sources (e.g. ClinGen, PanelApp) and these appear to be unscheduled and infrequent such that the data in the GenCC file is out of date when compared to the underlying data sources. Consequently, it would probably make more sense to take these annotations from the original data sources directly. Some of these aren't available other than through the GenCC, so we'll end up to a choice of ClinGen, the PanelApps, G2P or whatever.

kevinschaper commented 5 months ago

@julesjacobsen Would it make sense to just restrict a gencc ingest to the subset that we can only get from them, and live with the synchronization being a little behind?

madanucd commented 2 months ago

I agree with @julesjacobsen that the timing of releases is crucial for these sources. Although GENCC provides submitted data for each association, it might be from older releases.

GENCC classifies sources into the following categories:

As suggested by Jules, we may consider excluding data from diagnostic labs.

All gene-to-disease (G2D) source analyses can now be found in our GitHub source-data-analysis repository:

Currently, @iimpulse is addressing concerns about Mendelian conditions with multiple gene associations. I will close this ticket once we receive his clarification.

For the latest results and updates on G2D analysis, please refer to the GitHub repository.