Open GlenDC opened 5 years ago
For the technology I think https://github.com/graphql-go/graphql will be the way to go.
A very useful resource to bootstrap myself into GraphQL was https://graphql.github.io/learn/
Over the course of this issue I think it would also be useful to try to apply some best practices from https://graphql.github.io/learn/best-practices/:
Added an initial, still pure theory, attempt in defining a schema, using static queries only for now:
I think it should be enough to start playing with a first quick implementation, with the goal to as soon as possible be able to have a small web frontend that allows me to play with queries and iterate on this initial attempt as such.
https://github.com/99designs/gqlgen looks like it has potential. It allows us to generate the server code directly from the GraphQL schema, allowing us to focus on the business logic and ensuring it is in sync with our API definition.
Since https://github.com/threefoldtech/rivine/commit/70e12295de0119cd8f0b52ef0a0637ba99f1fdd0 the first phase of the GraphQL explorer starts to be feature complete, at least on a database level.
Objects can be looked up (wallets, contracts, blocks, transactions and outputs). Information can be requested. All seems to work and resolves in a lazy manner.
What are the next steps?
We need to have some regular meetings @robvanmieghem, @LeeSmet, @DylanVerstraete and anyone who is interested (e.g. @zaibon), starting with a first meeting today, where I can show the current state, we can have a a Q&A and provide feedback to where to go from that moment.
I need to iterate on the API, currently the API is quite complete, but I do not have pagination for big lists yet, there is no filtering yet, and any other advanced query parameters I could add to fields. This I will start doing today.
@LeeSmet will help me with setting up a TFChain std network explorer node that will have the current version of the GraphQL explorer running so we can also test it in a real environment. It will be redeployed each time I make substantial updates. This will be done in parallel with task (2).
GraphQL also supports updates of (web) sockets, called subscriptions. It has been added since 2015 I think, so I would think it is quite established by now (or so I hope). It would be good to provide already 3types of subscriptions: subscription to a wallet, contract or new (latest) block.
There are still "important" improvements that can be done to the current MVP implementation, despite the work already done to it, and than I'm not even talking about supporting extensions yet:
Currently the GraphQL explorer does not support unconfirmed data yet (transaction pool data). This can be easily resolved by providing an explorer.DB interface implementation, which subscribes to the Transaction Pool, and wraps another explorer DB. It could then be made so, that if the wrapped DB returns ErrNotFound for an object that could be in the pool theoretically, that it would check its current pool state if that is the case, and if so, return it with an unconfirmed flag to true. All objects that can be unconfirmed would from that moment also have an optional Unconfirmed Bool field. Doing this task is pretty easy, and so trivial that for me it is less urgent, and is only required to be done once we want to start using this module into production.
For task (5) I might have an idea. Currently a wallet contains some identification data, aggregated data (such as balance), and mostly linked data (blockids, txnids, coinoutputids, blockstakeoutputids). In the most storage-cheap approach we could instead perhaps link all used wallets in each block (by data ID). This would make the updates and storage very cheap. It would make the querying for outputs, transactions and blocks a lot more expensive though. Blocks would still be kinda OK, as that can be indexed directly, but for transactions we would need to resolve each block transaction ID, and than for outputs we would on top of that need to resolve each output to know if that output is linked to that address. Perhaps that is OK, but perhaps not. A balance that needs to be investigated I guess.
I propose that once we make some time for this task, that we setup some quick stand alone examples that we can benchmark, so we know what gives the best results, in terms of storage as well as update speed and query speed.
For Task (3) @LeeSmet tried to setup a std standard tfchain daemon, for which I provided a branch, but we seem to have an issue on the data storage side. It seems to take about 1 GB of disk space per 1000 blocks, not a very good trade-off.
So I am fairly sure that I'll need to look into optimising this heavily starting Monday next week.
Schema documentation has also been added. Our playground supports this also by default. If we want it is also possible to use third-party tools to generate documentation from this, should we want to display it in a stand alone way.
Example in our playground:
Last two days I was trying to optimise some stuff, both in terms of speed and in terms of storage.
We currenly require, with wallet data included, require about 260MB per 1000 blocks, which is a reduction of factor 4, compared to the initial implementation. Still a lot though.
We also already sync 12% faster (more or less).
Both optimisations have been mostly achieved by using msgpack instead of rivbin for the explorer graphql db storage.
Still a lot of room for optimisation though. Biggest one will be to not rely on the query feature for the unlocking of outputs. Besides that we should try to batch the applied/revert bocks in bigger groups, might also help (but not sure here). For sure not relying on the query feature for unlocking of outputs will help a lot already (or so is a theory of mine).
We're still using StormDB for now. If there are other embedded DB suggestions that I should try to use instead, feel free to hit me up.
My findings on Graphql implementations for frontend applications.
Apollo Graphql : https://github.com/apollographql
Which can be used in React and Vue. For the explorer frontend we will be focussing on the Vue implementation.
Apollo Graphql Vuejs: https://github.com/vuejs/vue-apollo
Its straightforward to integrate in our existing explorer frontend project. https://github.com/threefoldtech/rivine-chain-explorer.
Apollo Graphql Vuejs uses vue and typescript. Which fits our needs. It also has local state and cache management functionality. https://apollo.vuejs.org/guide/local-state.html#local-state
For a preview of how this will work: https://apollo.vuejs.org/guide/apollo/#usage-in-vue-components
Graphql explorer db now syncs pretty fast. On my laptop with wallet updates it can do about 20 blocks per second. Which is a lot faster than we started with. Disk space is still an issue, we'll need to do the aggregated wallet data different, that is for sure.
Without wallet updates we can go up to 60 blocks per seconds on my laptop.
API is also cleared up, removed the reference point thing, and kept it at height only. Getting a block by height can be done using the blockAt
query, and you can also get one starting from the end using a negative index, where -1
is the last block (same as omitting the height),
-2` the second last block, etc...
A fix of previous commit was needed to be made. We need to ensure to commit regularly to the disk, as otherwise it will just spam the RAM full. So now we commit to disk every 1000 blocks, during the initial sync. This seems to resolve most of our RAM-blow up issues, except that the aggregated wallets are still a pain in the ***. We probably need to stop aggregating the identifier references in the wallet, and instead do it differently such that our wallets do not grow (more or less with some factor) linear with the blockchain height.
You can find links to documentation and examples of the current feat/graphql-phase1
implementation at: https://gist.github.com/GlenDC/13e60383dd82a682a0af4d770f0873f5.
Blocks can now be fetched multiple at the same time, blocks
query examples have been added as example files and linked in the main file in the above gist-linked documentation.
One can also use filters, to filter on height, timestamp as well as define their own limits. This implementation works good so far, and uses cursor-based pagination. For the user the cursor is an opaque string, as to make it as easy as possible to use for a user.
Right now it's not possible to retrieve a transaction id for an output his childinput. Picture explains
Right now it's not possible to retrieve a transaction id for an output his childinput. Picture explains
Parent transaction of an input can now also be retrieved (thus also including the ID)
Going to round off this task for now. It is not complete, but there might be urgent tasks that take priority over this. Given the time it already took so far.
What we have:
485,000
blocks requires about 3 GB of data on disk (this uses msgpack, as it is a lot faster to unmarshal/marshal and it requires less space, we can probably even reduce this space with one line of code, by using optimized encoding option in the msgpack libray)What we do not have:
So far I am convinced of my choice for GraphQL, I think it is what we want/need. I certainly do not know of a better technology at the moment available for us to use that allows us to do what we want. More about this later.
As a database I currently use StormDB (v2) which uses boltDB. While it does seem to work for now, I am not convinced it is the most ideal solution. v3 promises to be a lot better, even allowing to use something else than boltDB, so perhaps this resolves all my doubts with StormDB as a choice. Even so, if you know of a better Database that is both efficient as well as allows us to do all the querying and indexing we need, feel free to suggest. My only desire would be that I can embed it, I find it easier for the user. But besides that I really have no strong opinion on the choice of backend, it is anyhow completely unrelated with my strong believe that GraphQL is the way to go for our chain web APIs.
This GraphQL proposal (and MVP implementation in this first unfinished phase) was done with good intentions, and thought process, yet not communicated clearly enough. Thanks to @DylanVerstraete we also have from a developer-as-a-user perspective a clear comparison how it is to support the old API (REST, explorer module) vs the new API (GraphQL), as Dylan tried both in his modern explorer. With the old API you require to write an entire parsing module understanding it all how it works, and a lot of work as a developer to get it right. On top of that you get a lot of data (sure you can add pagination, but that just makes you scale gigantic data, while in fact you do not need gigantic data). With the new API he has almost no work, no parsing to be done, and ReactJS/VueJS support GraphQL out of the box making it possible to even hook it directly up to your frontend components. We offer with the new API the same features as the old API, and even more (much more) features, while from the user it is easy to use and requires no work.
So why my choice for GraphQL in this proposal?
What you want from an API is that it is defined by a schema so it can serve as a contract. A contract that isn't broken (API's should be broken once defined as your contract). On top of that your documentation, server backends and clients should be generated from that contract so that you do not lose time in fixing bugs and trying to maintain all these things manually, as it is a battle you'll always lose (or throw a lot of unneeded resources in a very big cost to do it right).
GraphQL goes further than these base requirements I had. It is typed, giving a lot of validation out of the box in the generated code and making it a lot clearer to the user as well, as well as mapping nicely in generated client code. As GraphQL only gives you what you want you also do not need to learn the responses, as the responses look exactly like your queries, so learn your queries, and you know your responses. As the user only receives what she wants, you also can safely add new properties to types, without the fear that old users will now get a lot of extra data they do not need, as they won't receive that.
@robvanmieghem proposes that I use BCDB as a backend, as a test, to also resolve issue https://github.com/threefoldtech/jumpscaleX_threebot/issues/43.
Thus, I'll:
feat/graphql-phase1
branch, that I'll call feat/graphql-bcdb
, where I'll use BCDB instead of StormDBExplorer for tfchain syncing crashed with the following output:
└┌(tfchain-graphql)┌¨˙./tfchain-graphql -Mcgq -v
Loading...
Binding API Address and serving the API...
Setting up root HTTP API handler...
Loading gateway (1/3)...
Loading consensus set (2/3)...
Loading graphql explorer (3/3)...
goroutine 60 [running]:
runtime/debug.Stack(0x4664f7, 0x0, 0xc000805ed8)
/usr/lib/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
/usr/lib/go/src/runtime/debug/stack.go:16 +0x22
github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/build.Critical(0xc000805fa0, 0x2, 0x2)
/home/lee/go/src/github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/build/critical.go:15 +0xaa
github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/modules/explorergraphql.(*Explorer).InitialProcessConsensusChanges(0xc0000b2fc0, 0xc00007f380)
/home/lee/go/src/github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/modules/explorergraphql/explorer.go:104 +0xc4
created by github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/modules/consensus.(*ConsensusSet).initializeSubscribe.func1
/home/lee/go/src/github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/modules/consensus/subscribe.go:140 +0x606
Critical error: Explorer.ProcessConsensusChange failed failed to apply block: failed to get extension data from txn 58de24b753c9b4d1ec9e7994858a8ca848252542f658f39ef44d097fbf2a14fb (#2) of block 78c160e7d0bec959d416657511804c72f7f81bef2e833db0ee710db0f564a5d2: failed to unmarshal minter definition ext. data: unexpected EOF
Please submit a bug report here: https://github.com/threefoldtech/rivine/issues
panic: Critical error: Explorer.ProcessConsensusChange failed failed to apply block: failed to get extension data from txn 58de24b753c9b4d1ec9e7994858a8ca848252542f658f39ef44d097fbf2a14fb (#2) of block 78c160e7d0bec959d416657511804c72f7f81bef2e833db0ee710db0f564a5d2: failed to unmarshal minter definition ext. data: unexpected EOF
Please submit a bug report here: https://github.com/threefoldtech/rivine/issues
goroutine 60 [running]:
github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/build.Critical(0xc000805fa0, 0x2, 0x2)
/home/lee/go/src/github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/build/critical.go:17 +0x136
github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/modules/explorergraphql.(*Explorer).InitialProcessConsensusChanges(0xc0000b2fc0, 0xc00007f380)
/home/lee/go/src/github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/modules/explorergraphql/explorer.go:104 +0xc4
created by github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/modules/consensus.(*ConsensusSet).initializeSubscribe.func1
/home/lee/go/src/github.com/threefoldfoundation/tfchain/vendor/github.com/threefoldtech/rivine/modules/consensus/subscribe.go:140 +0x606
also syncing seems to be pretty slow
Found a couple of minutes to check it out and I fixed it already. Please pull when you find some minutes yourself.
Forgot to mention, but <https://explorer.rnd.threefoldtoken.com/explorer/graphql > is live to play with. It is however clear by now that StormDB is an awe-full choice. I think having an external SQL (Postgres or SQLite) will be the way to go, it will also allow us to aggregate on the fly with complex queries. Until then however we'll see how BCDB will do. If all goes well it should at the very least do better than StormDB.
As a first phase of working on the big R&D story #605, it would be a nice first step to provide a new module that works completely independent and provides a full query/subscription (so no mutation) scheme for all our queries.
This first phase would allow us to develop a complete scheme to query on our data, play with these queries, experiment with the subscription model (useful for light clients) and test its limits, all without breaking or changing any existing features.
It would also allow, once completed, light clients to already start supporting this new GraphQL endpoint (served still over HTTP(S)) as the phases, also for #605, that follow after this will be changes in the internals of rivine and should not impact the API for the users (see: light clients).