hetio / hetionet

Hetionet: an integrative network of disease
https://neo4j.het.io
257 stars 68 forks source link

Reproducing the creation of hetionet #34

Closed olszewskip closed 3 years ago

olszewskip commented 3 years ago

Hi! Not sure if this is the right place to ask this, but here goes:

I've read through https://think-lab.github.io/p/rephetio/. Hetionet seems extremely impressive and useful. I need something very similar, if not identical, but with the emphasis on diagnosing rare diseases, and I would also strongly prefer to have the ability of automatically updating or adding new data to my database, e.g. to include some new GWAS findings, tailoring specificity of the disease terms to my needs, or maybe adding other node types like genetic variants. Hence, I'm wondering, how hard would it be to reproduce something like hetionet from scratch, possibly in litttle steps (for a group of a couple of people)? I see that https://think-lab.github.io/p/rephetio/#methods has some detailed information about what steps where taken, and also quite a number of links to files hosted on Zenodo. Would You say that all information is there or should I also look elsewhere? Was the main "mode of operation" to download text files from the internet, parse/preprocess/unify/join the data using python scripts, and then inject into Neo4j?

Apologies for a vague question. Many thanks for any suggestions! :)

dhimmel commented 3 years ago

Sounds like you're most interested in https://github.com/dhimmel/integrate. This repo does the following:

download text files from the internet, parse/preprocess/unify/join the data using python scripts, and then inject into Neo4j?

Particularly, the integrate.ipynb notebook will be of interest.

Note that most datasets don't come directly from the upstream resource, but rather an intermediate repo that performs pre-processing. In total, there's dozens of repositories that work together to create Hetionet, but the creation is all orchestrated in the dhimmel/integrate repo.

olszewskip commented 3 years ago

Awesome! Thank You.