Closed geastwood closed 2 years ago
Here is my take:
It opens two other immediate questions:
How to deploy and communicate with Hydra
How to "hide" Hydra from public access.
Why Hydra?
I'm in large agree with the value that Hydra can bring to web3-indexer
.
The downside is obvious, these hydra instances have to be deployed somewhere and become part of the infra, and managing dependency and upgrade hydra could incur some work.
Jumping out of Hydra, I do think we can leverage other services in substrate community similar to Etherscan in evm-based network. For instance, we are also using Subscan to index our parachain. If they are good, we could also rely on them as a data source.
Good call on Subscan.
Regardless of what we choose, I think we agree that our GraphQL layer should be able to talk to these services.
I updated the issue's title to name other services and use this as a place to document our findings.
I think we're going to run into the same issue we did with subquery, if we care about off-chain data (e.g. NFT metadata) being in the database rather than being fetched from IPFS (via Fleek) at runtime then we have a problem with Subscan.
There's also another issue, with all 3 of hydra/subquery/our indexer we can write handlers to turn event data into something useful. For example on the event TokenMinted
we look up the token data via the polkadot API so we can find out if it is transferable or burnable, then we save this to the database. I suspect Subscan doesn't allow us to do this. At scale this could become a problem. Lets say somebody mints 1000 NFT tokens, if we need to get all their burnable tokens we'd have to get them from the database, then make 1000 polkadot API lookups to identify which are burnable.
Etherscan is a little different. As our chain is on Polkadot we may find all our data needs use our substrate indexer as the initial source of truth, and that we just need to call Etherscan to add complimentary data - which could be done at runtime in the GraphQL resolvers.
Regarding the overall architecture, I think we're all on the same page here. We'd have 1 Graph for the clients to interface with, then multiple things feeding into that, e.g. a substrate indexer & Etherscan.
The thing that most concerns me is the off the shelf solutions. When I work with the web3
library and the @polkadot/api
library I feel like I'm working with mature open source software. The (admittedly limited) time I've spent with Subquery & Hydra does not give me the same feeling. They both have nice websites with a bunch of logos to give the impression they're established in the community, but working with them gives me the very strong impression that neither are either mature or backed by a sizeable community.
Just because something is open source and open to scrutiny doesn't mean it has been scrutinised.
I'd also like to reiterate, building a chain listener is not difficult at all. It's my opinion that your future is much less risky if you have control of your own indexing infrastructure, rather than rely on a project that could be dead in a couple of years.
@wangminqi @chenzongxiong keep an eye on this discussion and speak out if some points don't align with your understanding. Or point out anything we might be missing.
I think we're going to run into the same issue we did with subquery, if we care about off-chain data (e.g. NFT metadata) being in the database rather than being fetched from IPFS (via Fleek) at runtime then we have a problem with Subscan.
There's also another issue, with all 3 of hydra/subquery/our indexer we can write handlers to turn event data into something useful. For example on the event
TokenMinted
we look up the token data via the polkadot API so we can find out if it is transferable or burnable, then we save this to the database. I suspect Subscan doesn't allow us to do this. At scale this could become a problem. Lets say somebody mints 1000 NFT tokens, if we need to get all their burnable tokens we'd have to get them from the database, then make 1000 polkadot API lookups to identify which are burnable.Etherscan is a little different. As our chain is on Polkadot we may find all our data needs use our substrate indexer as the initial source of truth, and that we just need to call Etherscan to add complimentary data - which could be done at runtime in the GraphQL resolvers.
We want to offload work of standard chain (such as polkadot, kusama) to SubScan, or chains will minimum customization need. I still think it makes sense we handle our chains indexing by ourselves.
Expend further on that, fleek can also be seen as a Service, not only NFT pallet can use it, others can also use it via GraphQL layer.
I think we're going to run into the same issue we did with subquery, if we care about off-chain data (e.g. NFT metadata) being in the database rather than being fetched from IPFS (via Fleek) at runtime then we have a problem with Subscan.
There's also another issue, with all 3 of hydra/subquery/our indexer we can write handlers to turn event data into something useful. (...) I suspect Subscan doesn't allow us to do this.
From my point of view, It is ok if those services don't fit all of our needs, we can still take advantage of what they do the best – index – and inspired by Hydra architecture, we can have additional small and custom processors that read the indexed data from DB and map it into business logic.
We are not loading Subsquid (Hydra) as a remote subschema through our gateway
as title suggested.