No information on how BioKG is constructed

snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning

https://ogb.stanford.edu

MIT License

1.89k stars 398 forks source link

No information on how BioKG is constructed #111

Closed cthoyt closed 3 years ago

cthoyt commented 3 years ago

neither the code in this repo nor the associated paper explain where the data comes from in the OGB-BioKG. Can you please point me towards the scripts used to construct this KG?

weihua916 commented 3 years ago

Hi! Thanks for your interest!

Marinka (@marinkaz) has created the dataset. She could explain a bit more about how the dataset was created. Thanks!

cthoyt commented 3 years ago

For a bit more context, we've wrapped the OGB heterogeneous/directed graph loaders as benchmark datasets for PyKEEN (https://github.com/pykeen/pykeen/ - for either using existing KGE models or developing your own) and we would like to improve the documentation about the provenance of these datasets.

cthoyt commented 3 years ago

Any updates on this?

weihua916 commented 3 years ago

Unfortunately, I cannot handle this. Could you please reach out to Marinka directly to get the information?

weihua916 commented 3 years ago

I will close it for now. Please have direct correspondence with Marinka.

cthoyt commented 3 years ago

Is there a particular way I should reach out to @marinkaz that isn't via a GitHub issue? Maybe it would be good to write up some contributing guidelines and update the README in this repository to better inform potential users if you don't think this is an appropriate forum for giving feedback

weihua916 commented 3 years ago

Here it is: https://dbmi.hms.harvard.edu/people/marinka-zitnik

sophiakrix commented 2 years ago

Hi there! Is there any official update on this that can be made public? Would appreciate it a lot!

DimitrisAlivas commented 2 years ago

Hello OGB team,

First of all, thank you for the work in collecting and publishing benchmarks for graph ML.

I also have the same question as the original post. Are there any updates on this? Correct me if I'm wrong but given the characterization of this dataset as "Benchmark", I think there is a need to address this.

Thank you in advance!