Discussion: Google Datastore higher order deserialisation/serialisation

I just wanted to document some of the challenges I have been facing with the codegen based on the JSON schema for Google Datastore and making items "useable". At the moment the codegen deserializes at a protocol level to an intermediate state, bounded by the datatypes that Google Datastore, but does not actually provide "usable"/"logical" objects in JavaScript.

For example, a fundamental type is Value which code gens to something like this:

interface Value {
  arrayValue?: ArrayValue;
  blobValue?: Uint8Array; // auto-decoded from `string`
  booleanValue?: boolean;
  doubleValue?: number;
  entityValue?: Entity;
  excludeFromIndexes?: boolean;
  geoPointValue?: LatLng;
  integerValue?: bigint;  // auto-decoded from `string`
  keyValue?: Key;
  meaning?: number;
  nullValue?: "NULL_VALUE";
  stringValue?: string;
  timestampValue?: Date; // auto-decoded from `string`
}

Which is a real pain to deal with in a logical type safe way. I hand crafted a more usable type in this fashion:

interface ValueBase {
  excludeFromIndexes?: boolean;
  meaning?: number;
}

interface ValueArray extends ValueBase {
  arrayValue: ArrayValue;
}

interface ValueBlob extends ValueBase {
  blobValue: string;
}

interface ValueBoolean extends ValueBase {
  booleanValue: boolean;
}

interface ValueDouble extends ValueBase {
  doubleValue: number;
}

interface ValueEntity extends ValueBase {
  entityValue: Entity;
}

interface ValueGeoPoint extends ValueBase {
  geoPointValue: LatLng;
}

interface ValueInteger extends ValueBase {
  integerValue: string;
}

interface ValueKey extends ValueBase {
  keyValue: Key;
}

interface ValueNull extends ValueBase {
  nullValue: "NULL_VALUE";
}

interface ValueString extends ValueBase {
  stringValue: string;
}

interface ValueTimestamp extends ValueBase {
  timestampValue: string;
}

type Value =
  | ValueArray
  | ValueBlob
  | ValueBoolean
  | ValueDouble
  | ValueEntity
  | ValueGeoPoint
  | ValueInteger
  | ValueKey
  | ValueNull
  | ValueString
  | ValueTimestamp;

This provides a much more user friendly set of types that make it really easy to write type guards as well as provide much better code completion experience. It also makes it a lot easier to write a single pass deserialization of the higher order Datastore type to a JavaScript type (because the deserialized value isn't really usable/friendly for use in JavaScript) so I have this deserialisation function:

function datastoreValueToValue(value: types.Value): unknown {
  if (isValueArray(value)) {
    return value.arrayValue.values.map(datastoreValueToValue);
  }
  if (isValueBlob(value)) {
    return base64.decode(value.blobValue);
  }
  if (isValueBoolean(value)) {
    return value.booleanValue;
  }
  if (isValueDouble(value)) {
    return value.doubleValue;
  }
  if (isValueEntity(value)) {
    return entityToObject(value.entityValue);
  }
  if (isValueGeoPoint(value)) {
    return value.geoPointValue;
  }
  if (isValueInteger(value)) {
    return stringAsInteger(value.integerValue);
  }
  if (isValueKey(value)) {
    return value.keyValue;
  }
  if (isValueNull(value)) {
    return null;
  }
  if (isValueString(value)) {
    return value.stringValue;
  }
  if (isValueTimestamp(value)) {
    return new Date(value.timestampValue);
  }
}

Also, when trying to serialise JavaScript objects to Datastore values, there are all sorts of validation logic that are not expressible in the schema. For example for a stringValue:

When exclude_from_indexes is false (it is indexed) , may have at most 1500 bytes. Otherwise, may be set to at most 1,000,000 bytes.

Which are all things that make sense to handle in the abstraction before sending it over the wire and getting a rejection from the API.

In addition, the codegen generates a Datastore class that has not the most usable APIs. For example the "runQuery" API:

class Datastore {
  async projectsRunQuery(projectId: string, req: RunQueryRequest): Promise<RunQueryResponse>;
}

The project_id is part of the service account JSON and is tied to the instance of the Datastore, all things which can't be expressed in the schema. Also the RunQueryRequest is a composite object that doesn't really make sense from a usage perspective, so this is what the hand crafted version looks like:

class Datastore {
  async runQuery(query: Query): Promise<RunQueryResponse>;
  async runGqlQuery(gqlQuery: GqlQuery): Promise<RunQueryResponse>;
}

lucacasonato / deno_googleapis

Discussion: Google Datastore higher order deserialisation/serialisation #3