Closed xianruizhong closed 1 year ago
Hi, the *.pb files are serialized protocol buffers, which are a structured data format that follows the ORD schema. You can load a dataset into python as follows:
from ord_schema import message_helpers
from ord_schema.proto import dataset_pb2
dataset = message_helpers.load_message(filename, dataset_pb2.Dataset)
Once you have the dataset loaded into python, you can extract whatever fields you'd like from the protocol buffers. Here's an example that uses a dataset to train a machine learning model: https://github.com/open-reaction-database/ord-schema/blob/main/examples/applications/Perera_Science_Granda_Nature_Suzuki/Granda_Perera_ml_example.ipynb.
@skearnes I am trying to display a row in a readable format but I cant seem to get it to work. Could you help with that? I tried your suggestion but it erred.
you can see it here: https://colab.research.google.com/drive/1xIWk3hYF7FtRA58AFLsqy-X80yk3z9fi?usp=sharing
Thanks
Hi, I'm a bit confused about the .pd file structure. I opened it with python and I saw a long binary string. Is it possible to get each reaction's information by some criteria, for example, by reaction ID? And can it be transformed into a more structured data like JSON?