googleapis / python-datastore

Apache License 2.0
80 stars 44 forks source link

Datastore: Expose an API for working directly with Entity Protobuf objects. #7

Open Kami opened 5 years ago

Kami commented 5 years ago

Right now the Datastore Python client library provides a high-level convenience API for working with native Python types by utilizing a special Entity class which behaves like a dictionary.

Underneath, the client library converts this Entity and native Python types to Entity Protobuf object which is what Google Datastore API expects and works with.

This works great when only working with Python and don't care about actual object schema, but it brakes down if you want to build some kind of cross-programming language ORM with strict schema and work with the Datastore from multiple programming languages.

There are multiple possible approaches and options when building a programming language agnostic ORM and entity model schema for Datastore, but in the end, you need code to translate your ORM objects into Entity Protobuf objects with which Datastore gRPC API works with.

In our scenario, we decided to use Protobuf as schema for database models (aka objects / entities which are stored inside the datastore). To be able to accomplish that, we developed libraries which translate arbitrary Protobuf message objects to Entity Protobuf objects.

Here is an example of such translator library for Python - https://github.com/Kami/python-protobuf-cloud-datastore-entity-translator

To store this Entity Protobuf object inside the datastore, we have two options:

1. Utilize low level gRPC / Protobuf based Python client

Utilizing low level gRPC based Python client (https://github.com/GoogleCloudPlatform/google-cloud-datastore/tree/master/python) is not ideal, because it requires a lot of glue code in our app.

We need to take care of things such as transactions, rollbacks, etc.

Basically, we need to re-invent the wheel and build a library which is very similar to this high level client library, but works with Entity Protobuf objects instead of Entity classes.

2. Utilize high level client and waste CPU cycles on unnecessary conversion round-trips

Another option is to directly utilize this high level client library.

The problem is that this library doesn't expose primitives for working directly with Entity Protobuf objects.

This means we need to do something along those lines:

# 1. Get code path
...
entity = client.get(key)

# This call is redundant and we could get rid of it if we could access Entity Protobuf object directly
entity_pb = datastore.helpers.entity_to_protobuf(entity)

my_model_pb = entity_pb_to_model_pb(my_model_pb2.MyModelPB, entity_pb)

# 2. Put code path
...
entity_pb = model_pb_to_entity_pb(my_model_pb)

# This call is redundant, because we already have Entity Protobuf object which we could use directly
entity = datastore.helpers.entity_from_protobuf(entity_pb)
client.put(entity)

To make our life and life of other people who have similar problems easier (if you search around the internet, you will see there are more people who have similar problems), I think it makes sense to expose public methods which work directly with Entity Protobuf objects in this high level client library.

This should be relatively straight forward since the client library already works with Entity Protobuf objects in the background.

It will mostly just require some code shuffling around / refactoring and adding new methods for working directly with Entity Protobuf objects. It would also mean very little additional code to maintain since most of the primitives are already in place today.

Having said that, I propose adding the following new methods which would work directly with Entity Protobuf objects:

I'm happy to implement those changes.

In fact, I already have a very hack-ish version locally which I plan to push later and open a pull request so we can start more concrete discussion about the actual code changes.

econtal commented 4 years ago

This would be great even to use vanilla python longs that do not fit in a signed int64 (such as an uint64 value).

Currently _set_protobuf_value (called in entity_to_protobuf) breaks with ValueError: Value out of range.

As a user, I could wrap such values in a custom class to prevent using the default integer in favor of the actual uint64.

@Kami would you mind sharing some hints around your "very hack-ish local version"? đŸ˜„