open-reaction-database / ord-data

Official data repository for the Open Reaction Database
https://open-reaction-database.org
Creative Commons Attribution Share Alike 4.0 International
210 stars 53 forks source link

How to transform the data to JSON file #170

Closed xianruizhong closed 1 year ago

xianruizhong commented 1 year ago

Hi, I'm a bit confused about the .pd file structure. I opened it with python and I saw a long binary string. Is it possible to get each reaction's information by some criteria, for example, by reaction ID? And can it be transformed into a more structured data like JSON?

skearnes commented 1 year ago

Hi, the *.pb files are serialized protocol buffers, which are a structured data format that follows the ORD schema. You can load a dataset into python as follows:

from ord_schema import message_helpers
from ord_schema.proto import dataset_pb2

dataset = message_helpers.load_message(filename, dataset_pb2.Dataset)

Once you have the dataset loaded into python, you can extract whatever fields you'd like from the protocol buffers. Here's an example that uses a dataset to train a machine learning model: https://github.com/open-reaction-database/ord-schema/blob/main/examples/applications/Perera_Science_Granda_Nature_Suzuki/Granda_Perera_ml_example.ipynb.

CakeCrusher commented 9 months ago

@skearnes I am trying to display a row in a readable format but I cant seem to get it to work. Could you help with that? I tried your suggestion but it erred.

you can see it here: https://colab.research.google.com/drive/1xIWk3hYF7FtRA58AFLsqy-X80yk3z9fi?usp=sharing

Thanks