How Hydra/ Subscan/ Etherscan/ SubQuery fit in our setup

geastwood commented 3 years ago

as title suggested.

jonalvarezz commented 3 years ago

Here is my take:

Our indexer is our user-facing layer
Behind it, other indexers run. I have noticed we usually refer to these as "blocks", as other services like IPFS and centralized services could be plugged in.
Hydra is another "block" we could use to index Polkadot or other Substrate based networks

It opens two other immediate questions:

How to deploy and communicate with Hydra
- Hydra and any other block could have its own setup and independent deploy infra/ cycle.
- One of Hydra advantages is that it is split into different pieces so we could:
- Communicate with Hydra talking to its GraphQL API or talking directly to its database.
How to "hide" Hydra from public access.
- We can have a private network where only our GrapqhQL layer port is exposed so any other block will not be reachable from the outside
- if we ship it to a different cloud, we can do some server config to whitelist domains/ IPs to serve to.
- Alternatively, we can live it publicly reachable too... like why is it a bad idea?

jonalvarezz commented 3 years ago

Why Hydra?

Indexing a blockchain is relatively simple
There are indexers build by the community that already do this job
What really adds value to it, is the business logic we add via models and mappers.
If we lift the business logic to our GraphQL layer then there is little value on us focusing on building custom indexers for each blockchain protocol
Consequently, we can rely on these existing libraries that are pretty good indexing at blockchains and focus on satisfying our uses cases on the GraphQL layer.
if we think of these indexers as independent services, then, later on, we can update and replace them without impacting our apps since the GraphQL layer will keep the user-facing API consistent.

geastwood commented 3 years ago

I'm in large agree with the value that Hydra can bring to web3-indexer.

The downside is obvious, these hydra instances have to be deployed somewhere and become part of the infra, and managing dependency and upgrade hydra could incur some work.

geastwood commented 3 years ago

Jumping out of Hydra, I do think we can leverage other services in substrate community similar to Etherscan in evm-based network. For instance, we are also using Subscan to index our parachain. If they are good, we could also rely on them as a data source.

jonalvarezz commented 3 years ago

Good call on Subscan.

Regardless of what we choose, I think we agree that our GraphQL layer should be able to talk to these services.

I updated the issue's title to name other services and use this as a place to document our findings.

bdmason commented 3 years ago

I think we're going to run into the same issue we did with subquery, if we care about off-chain data (e.g. NFT metadata) being in the database rather than being fetched from IPFS (via Fleek) at runtime then we have a problem with Subscan.

There's also another issue, with all 3 of hydra/subquery/our indexer we can write handlers to turn event data into something useful. For example on the event TokenMinted we look up the token data via the polkadot API so we can find out if it is transferable or burnable, then we save this to the database. I suspect Subscan doesn't allow us to do this. At scale this could become a problem. Lets say somebody mints 1000 NFT tokens, if we need to get all their burnable tokens we'd have to get them from the database, then make 1000 polkadot API lookups to identify which are burnable.

Etherscan is a little different. As our chain is on Polkadot we may find all our data needs use our substrate indexer as the initial source of truth, and that we just need to call Etherscan to add complimentary data - which could be done at runtime in the GraphQL resolvers.

bdmason commented 3 years ago

Regarding the overall architecture, I think we're all on the same page here. We'd have 1 Graph for the clients to interface with, then multiple things feeding into that, e.g. a substrate indexer & Etherscan.

bdmason commented 3 years ago

The thing that most concerns me is the off the shelf solutions. When I work with the web3 library and the @polkadot/api library I feel like I'm working with mature open source software. The (admittedly limited) time I've spent with Subquery & Hydra does not give me the same feeling. They both have nice websites with a bunch of logos to give the impression they're established in the community, but working with them gives me the very strong impression that neither are either mature or backed by a sizeable community.

Just because something is open source and open to scrutiny doesn't mean it has been scrutinised.

I'd also like to reiterate, building a chain listener is not difficult at all. It's my opinion that your future is much less risky if you have control of your own indexing infrastructure, rather than rely on a project that could be dead in a couple of years.

geastwood commented 3 years ago

@wangminqi @chenzongxiong keep an eye on this discussion and speak out if some points don't align with your understanding. Or point out anything we might be missing.

geastwood commented 3 years ago

I think we're going to run into the same issue we did with subquery, if we care about off-chain data (e.g. NFT metadata) being in the database rather than being fetched from IPFS (via Fleek) at runtime then we have a problem with Subscan.

There's also another issue, with all 3 of hydra/subquery/our indexer we can write handlers to turn event data into something useful. For example on the event TokenMinted we look up the token data via the polkadot API so we can find out if it is transferable or burnable, then we save this to the database. I suspect Subscan doesn't allow us to do this. At scale this could become a problem. Lets say somebody mints 1000 NFT tokens, if we need to get all their burnable tokens we'd have to get them from the database, then make 1000 polkadot API lookups to identify which are burnable.

Etherscan is a little different. As our chain is on Polkadot we may find all our data needs use our substrate indexer as the initial source of truth, and that we just need to call Etherscan to add complimentary data - which could be done at runtime in the GraphQL resolvers.

We want to offload work of standard chain (such as polkadot, kusama) to SubScan, or chains will minimum customization need. I still think it makes sense we handle our chains indexing by ourselves.

Expend further on that, fleek can also be seen as a Service, not only NFT pallet can use it, others can also use it via GraphQL layer.

jonalvarezz commented 3 years ago

I think we're going to run into the same issue we did with subquery, if we care about off-chain data (e.g. NFT metadata) being in the database rather than being fetched from IPFS (via Fleek) at runtime then we have a problem with Subscan.

There's also another issue, with all 3 of hydra/subquery/our indexer we can write handlers to turn event data into something useful. (...) I suspect Subscan doesn't allow us to do this.

From my point of view, It is ok if those services don't fit all of our needs, we can still take advantage of what they do the best – index – and inspired by Hydra architecture, we can have additional small and custom processors that read the indexed data from DB and map it into business logic.

bdmason commented 2 years ago

We are not loading Subsquid (Hydra) as a remote subschema through our gateway

litentry / litentry-graph

How Hydra/ Subscan/ Etherscan/ SubQuery fit in our setup #2