open-reaction-database / ord-schema

Schema for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
95 stars 26 forks source link

Add Example for Converting .pb to JSON #705

Closed brucejwittmann closed 5 months ago

brucejwittmann commented 11 months ago

I'm not very familiar with .pb files (and I would assume a decent number of other people who want to use this resource aren't either). I saw in the documentation in a few places that it is easy to convert a .pb file to a .json file, but I cannot find any examples in the repository that show how to do that. It would be great if a quick example could be added to the documentation.

connorcoley commented 11 months ago

Here's a quick code snippet courtesy of @FanwangM

from ord_schema.message_helpers import load_message, write_message
from ord_schema.proto import dataset_pb2

dataset = load_message('input_fname.pb.gz', dataset_pb2.Dataset)
write_message(dataset, 'output_fname.pbtxt')

The message can be loaded as a .pb or .pb.gz; this assumes that a full dataset is being loaded. The "pbtxt" extension is very human-readable (I view it as analogous to a yaml), but can also be switched to "json" if you want a proper json.

We will get this added to the documentation!

FanwangM commented 11 months ago

Here is a short snippet to convert *.pb files to JSON.

# import requirements
import json

from ord_schema.message_helpers import load_message, write_message
from ord_schema.proto import dataset_pb2
from google.protobuf.json_format import MessageToJson

dataset = load_message(
    "sample_file.pb.gz",
    dataset_pb2.Dataset,
)

# take one reaction message from the dataset for example
rxn = dataset.reactions[0]
rxn_json = json.loads(
    MessageToJson(
        message=rxn,
        including_default_value_fields=False,
        preserving_proto_field_name=True,
        indent=2,
        sort_keys=False,
        use_integers_for_enums=False,
        descriptor_pool=None,
        float_precision=None,
        ensure_ascii=True,
    )
)

print(rxn_json)

Using the MessageToJson function (https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html) would work, but that's only a string represntation. We will need to use json module to convert the string representation to JSON data.

Hope this helps a little. @brucejwittmann

FanwangM commented 11 months ago

Where do you think would be a good place to add to the documentation? Instead of adding directly to somewhere in the documentation, I think it can be benefitical to have a function in message_helpers.py to convert the protobuf messages into a json object. What do you think? @connorcoley

connorcoley commented 11 months ago

Including it as part of the README for ord-data would make the most sense to me.

brucejwittmann commented 11 months ago

Thank you both for your help! For what it's worth, I was digging around in message_helpers.py to see if I could find a message_to_json object. Seems like a natural place to put it.

bdeadman commented 7 months ago

I will review this and action the PR.

bdeadman commented 5 months ago

Documentation was added to ord-data README in ord-data#179 by @FanwangM. Closing issue as it is compled.