Closed brucejwittmann closed 5 months ago
Here's a quick code snippet courtesy of @FanwangM
from ord_schema.message_helpers import load_message, write_message
from ord_schema.proto import dataset_pb2
dataset = load_message('input_fname.pb.gz', dataset_pb2.Dataset)
write_message(dataset, 'output_fname.pbtxt')
The message can be loaded as a .pb or .pb.gz; this assumes that a full dataset is being loaded. The "pbtxt" extension is very human-readable (I view it as analogous to a yaml), but can also be switched to "json" if you want a proper json.
We will get this added to the documentation!
Here is a short snippet to convert *.pb files to JSON.
# import requirements
import json
from ord_schema.message_helpers import load_message, write_message
from ord_schema.proto import dataset_pb2
from google.protobuf.json_format import MessageToJson
dataset = load_message(
"sample_file.pb.gz",
dataset_pb2.Dataset,
)
# take one reaction message from the dataset for example
rxn = dataset.reactions[0]
rxn_json = json.loads(
MessageToJson(
message=rxn,
including_default_value_fields=False,
preserving_proto_field_name=True,
indent=2,
sort_keys=False,
use_integers_for_enums=False,
descriptor_pool=None,
float_precision=None,
ensure_ascii=True,
)
)
print(rxn_json)
Using the MessageToJson
function (https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html) would work, but that's only a string represntation. We will need to use json
module to convert the string representation to JSON data.
Hope this helps a little. @brucejwittmann
Where do you think would be a good place to add to the documentation? Instead of adding directly to somewhere in the documentation, I think it can be benefitical to have a function in message_helpers.py
to convert the protobuf messages into a json object. What do you think? @connorcoley
Including it as part of the README for ord-data
would make the most sense to me.
Thank you both for your help! For what it's worth, I was digging around in message_helpers.py
to see if I could find a message_to_json
object. Seems like a natural place to put it.
I will review this and action the PR.
Documentation was added to ord-data README in ord-data#179 by @FanwangM. Closing issue as it is compled.
I'm not very familiar with .pb files (and I would assume a decent number of other people who want to use this resource aren't either). I saw in the documentation in a few places that it is easy to convert a .pb file to a .json file, but I cannot find any examples in the repository that show how to do that. It would be great if a quick example could be added to the documentation.