Open azf20 opened 1 year ago
An additional use case which the above design doesn't allow for is where there is a change in on-chain state which needs to be reflected in a subgraph, where there isn't an emitted event.
A specific example is where the tokenURI(tokenId)
is updated on an ERC721 contract, perhaps to point to a new piece of token metadata. In this case, the subgraph needs to:
tokenURI
This process would be triggered by an off-chain event (the equivalent to "refresh metadata", on sites such as Opensea)
Note that you could argue that with substreams you could track changes in storage slots, but these don't necessarily adhere to a standard across contracts, if your use case is the above NFT example
From a security perspective, I think there are a couple of concerns that come to mind, this could be "easily" leveraged with malicious intent, for instance, an unprotected node endpoint could be used to make a lot of calls to a single endpoint or access data that was accidentally made available over http. We can mitigate but setting an allow list, but then we go back to it being less than useful since the indexers would need to know ahead of time which URLs can be accessed. (We could consider adding base_urls to the manifest which would be enforced, that does not prevent the above.)
From a determinism point of view this is really hard to manage as there are too many variables, from unavailability to data changes over time. I think the implementation effort, considering the security measures we need to consider, determinism, usability, etc is quite high for something that could easily be solved by the end user, who could either publish relevant data to ipfs/arweave or embed this fetching of data from the app's side, specifically the SG can still get the URL saved and when consuming the information, the subgraph consumer would fetch this data in parallel.
There is an operational concern of data sizes, this is obviously not new to arweave and ipfs but those storage have an incentive (cost) to keep files smaller, other endpoints like S3 are very cost efficient for quite larges files, this may also be an overhead in terms of database efficient when storing large blobs, the graph-node can limit this size, of course, which once again may impact some of the usability. (This may be something already discussed and settled since it's similar to IPFS and arweave for extreme cases)
In terms of providing an alternative, I think an interesting discussion could be the introduction of some query time WASM, this could provide some of the flexibility by supporting queries to existing subgraphs (potentially more than 1) and the ability to fetch some files from decentralised but maybe even arbitrary http data sources as well. The difference here is it could be metered, maybe stored temporarily(or permanently as well) and overall more flexible since it builds on the existing solid foundation of subgraphs but would also allow very flexible behaviour.
I really think that this is one of the features that would unlock a complete new class of use-cases. That being said, I also understand that this is tricky to get right. So maybe it helps if we specify a real-world use-case and then design the system backwards from there?
I suggest we take the POAP subgraph as an example.
The tracked POAP contract is basically an ERC721 contract with tokenURI
s hosted on their own servers, which then references a image_url
also hosted on their own servers (examples linked).
In my opinion, a consumer of such a subgraph might have the following requirements:
{"trait_type":"city","value":"Accra"}
Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.
Ford, Simon, and I recently met with a Andreas, a dev from the LUKSO project. Our conversation was mainly so Andreas could share feedback on LUKSO's use of The Graph, but we briefly discussed how LUKSO has tackled arbitrary HTTP file data sources.
Andreas mentioned that LUKSO stores Keccak-256 hashes of HTTP file data source metadata on chain. This enables them to retrieve via Arweave or IPFS and still validate the HTTP file data source.
I think their erc725.js
tool, the concept of VerifiableURI
, and the encodeDataSourceWithHash
method might be relevant here.
Extend File Data Sources to support fetching arbitrary off-chain files, based on an HTTP URL.
The fetching process should be aware of HTTP return codes which provide useful information on the "liveness" of a given endpoint. This might require more robust retry & back-off rules (for example if an endpoint is no longer active).
Unlike IPFS and Arweave File Data Sources, Files from HTTP endpoints may change over time, which implies a need to refetch and reprocess. In that case, the new data would need to over-write previous entities. This refetching could be triggered by a new on-chain entity creating a file data source with the same URL, or by a manual update on the indexing status API:
Note: there is the case of NFTs, where the actual
tokenURI
changes over time. This might require a further pattern where thetokenURI
is refetched as of the latest block, but this requires further definition.