Proof of concept credit metadata

aaronc commented 3 years ago

The current ecocredit design specifies metadata as simply bytes allowing us to iterate on the actual schema off-chain.

I propose the following simple JSON structures as simple proof-of-concept metadata.

For credit classes:

{
  "name":"Regen Network Carbon+ Grasslands",
  "type":"carbon"
}

For batches:

{
  "polygon": { /* GeoJSON */ },
  // the start and end dates for the carbon drawdown this credit corresponds to:
  "start_date":"2019-01-01", 
  "end_date":"2019-12-31"
}

From what I understand this would give us some bare minimum data. Then in Postgres we could use PostGIS to index batches based on polygon and dates (/cc @robert-zaremba)

Any upgrades you could suggest @blushi ?

robert-zaremba commented 3 years ago

For storing metadata I would prefer to not use JSON. As discussed in other place we will need to limit the metada space anyway. I would use FlatBuffers or Cap’n Proto

aaronc commented 3 years ago

For storing metadata I would prefer to not use JSON. As discussed in other place we will need to limit the metada space anyway.

I would use FlatBuffers or Cap’n Proto

Why not just stick with protobuf then?

robert-zaremba commented 3 years ago

Yes, probably even better. My point is to have some preferred tool which we will use in documents, tutorials etc... this will somehow drive an adoption.

At the end we will need to have a minimal schema to take any sense of data in postgresql. So for PoC: let's store as protobuf, the emitter will use same scheme to decode data and send it to postges. In the future we can:

extend the credit_class and batch_class scheme so we will have only one type
put the rest in the metadata with a scheme-less serialization (could be protobuf or dynamic like msgpack)

aaronc commented 3 years ago

Okay, protobuf, json or whatever I think doesn't matter too much for PoC. My reasoning around JSON is that GeoJSON is pretty well-defined and that's the bulk of the payload. If we try to do protobuf we'll end up needing to choose some other geo-encoding, maybe EWKB. I don't really care too much at this stage. I'm mainly trying to see can we get something up and running that shows the credits we'll be issuing soon on the devnet.

In the longer term, I've always thought we should strongly aim for a format that aligns with the RDF data model. Ideally that would be some binary serialization on-chain and not just JSON-LD text. I actually coded a PoC of this data model and serialization format a year or two ago but it will take more work for me to feel like it's complete so right now I'm just aiming to get something we can play with...

robert-zaremba commented 3 years ago

In Postgres we store the data in a proper geo-spatial type, note a GeoJSON. There are functions to convert GeoJSON to Postgres data format. Instead of polygon we can have location: GeoJSON field which can be a point, polygon or a box. When processing it we can save it to a proper table column (point, polygon ....) - with that we will be able to do proper queries.

aaronc commented 3 years ago

Postgres uses EWKB which we can consider as well.

Generally we will only be dealing with polygons it multi polygons. I can't think of a use case where we'd use points or lines for credits.

blushi commented 3 years ago

Generally we will only be dealing with polygons it multi polygons. I can't think of a use case where we'd use points or lines for credits.

Yeah I agree polygons or multi polygons represent the main use cases afaik.

In the future we can:

extend the credit_class and batch_class scheme so we will have only one type

@robert-zaremba not sure to get what you mean by that?

robert-zaremba commented 3 years ago

@aaronc I was thinking about a point as well. My motivation was to experiment with 2 data types, and polygon and point are 2 basic primitives to represent a location.

robert-zaremba commented 3 years ago

BTW: we don't have multi polygons. We can represent it as an array of polygons.

robert-zaremba commented 3 years ago

Self note for EWKT: https://docs.snowflake.com/en/sql-reference/data-types-geospatial.html

robert-zaremba commented 3 years ago

extend the credit_class and batch_class scheme so we will have only one type

@robert-zaremba not sure to get what you mean by that?

@blushi today all this data is scheme less, stored in a binary array. We just say: create an object with name, and type and serialize it using protobufs. Then when processing, we can try & catch with few formats (protobufs, JSON, msgpack ...) if a user didn't obey the instructions. But once we clarify the base required arguments, it will be better to add it to a message type and have a proper scheme.

robert-zaremba commented 3 years ago

@aaronc , @aaronc So, for the PoC is there any preference for a geo location format (GeoJSON, EWKT..) in the request message (MsgCreateBatchRequest)?

aaronc commented 3 years ago

@aaronc , @aaronc So, for the PoC is there any preference for a geo location format (GeoJSON, EWKT..) in the request message (MsgCreateBatchRequest)?

I would say GeoJSON is best for PoC because it had better client side support. @blushi ?

When we settle on a standardized schema we probably want some efficient binary representation like EWKB or a custom protobuf type. I forget about whether EWKB is well supported in JavaScript.

blushi commented 3 years ago

I would say GeoJSON is best for PoC because it had better client side support. @blushi ?

I'd say so too, it can be used as is with Mapbox for instance which is what we have been using so far for the registry.

regen-network / regen-ledger

Proof of concept credit metadata #178