stellar / stellar-etl

Stellar ETL will enable real-time analytics on the Stellar network
Apache License 2.0
31 stars 12 forks source link

Figure out the serialization methods of other blockchain ETLs #40

Closed Isaiah-Turner closed 4 years ago

Isaiah-Turner commented 4 years ago

What

Look at existing blockchain ETL projects and figure out the serialization methods they use to output data. Also, figure out what additional configuration files or libraries they use for serialization.

Why

We need a serialization method for the Stellar ETL, and by looking at existing ETLs we can get an idea of the best method for us.

Isaiah-Turner commented 4 years ago

The Ethereum ETL has two serialization methods. They export as CSV or JSON. The Bitcoin ETL also serializes data in CSV or JSON formats. Bitcoin's ETL also allows for streaming data to the console or to a Google Pub/Sub topic. Other ETLs like IOTeX and Tezos similarly export in CSV or JSON.

All of them use the base blockchain ETL's composite_item_exporter, which can create a JsonLinesItemExporter or CsvItemExporter. Since this functionality is provided in Python, which is the same language used to develop the ETLs listed above, it makes sense that they are uniform. Developers can use an existing exporter script instead of writing their own. Since our ETL is written in Go, we will have to write our own serialization methods. For the sake of conformity we probably should allow output in CSV and JSON. The encoding/csv and encoding/json packages will be helpful. We also could offer another format, like XDR. The stellar/go-xdr format would be helpful if we also offer XDR.

debnil commented 4 years ago

Looks great! Agreed that we should enable both CSV and JSON serialization. If we define structs well, it shouldn't be too much extra work. It will be a good win for usability.

As we've previously discussed, this doesn't change the goal. We'll need to define custom structs, in which each struct represents the schema of a BigQuery table. Each will be roughly analogous to a table in the current BigQuery, although we'll have to make some changes.

The question then becomes the method. Though I like XDR in principle, it may require some extra tooling that's slightly outside the scope of this project. I don't like making new tooling, but I also dislike hand-rolling structs. Curious if other folks have suggestions.

Isaiah-Turner commented 4 years ago

It looks like we will be making a schema.go file in the transform folder that contains all the struct definitions that we need. This will reduce the need for external tooling that would convert a struct defined in some other format into a Go struct.