Query time computation (custom GraphQL resolvers)

davekaj commented 5 years ago

Description

Solidity contracts can have view/constant functions that do calculations for free. A some-what common pattern is for solidity developers to do this, and then hit an ethereum node for free from the front end to show to most up to date data.

Example of a protocol doing this

Specifically, Compound does this, and for their case it makes a lot of sense. Each block their users accumulate more interest based on the block number increasing. The storage of the smart contract doesn't update, but if you were to cash out your earned interest it would accumulate the amount right before cashing out.

So we have this idea of freshEarnedInterest and staleEarnedInterest where the fresh one is calculated to the most recent block, and the stale one with the most recent block the user interacted with the protocol, and is the real stored value on the blockchain.

Proposed Ideas on handling this

Block triggers doesn't really work. For certain situations I am sure it would. But in this case it doesn't really. There could be 1000's of users signed up for compound protocol V2 within a few months. To trigger on a block, and update each user is unrealistic, and inefficient. So block triggers isn't really a solution
Query time computation seems to be the best solution. At the time of any query, the subgraph could do calculations based on block number, or other values. It would probably also be beneficial if the query time computation could do direct contract calls, and take advantage of view/constant functions in the solidity code. This is because it isn't always easy to replicate the calculations in AssemblyScript as it is in Solidity. Some things might even be impossible right now.
As a note, developers should be able to do these calculations easily on their front end with web3 and such, and many queries to infura. But long term we definitely want to make the graph nodes easily do this, so it isn't needed in the front end.

Next Steps

So I suggest query time computation. I imagine this is an epic, and might take a while, so looking for input on the following:

Agreement or disagreement on the query time computation
Whatever is agreed upon, starting up the process to put in the pipeline of work to be done, and get some sort of estimate

davekaj commented 5 years ago

Ran into this with Uniswap as well (first time I ran into it was Compound)

For reference, here is a practical explanation of what is needed for Uniswap:

A user wants to know their unrealized profit from providing liquidity
They own 2 / 100 liquidity tokens, so if they cash out, they will get 2% of all ether and tokens stored in the contract
they have provided 1.5 ETH and 250 DAI
the exchange has 77 ETH and 13000 DAI
the token price is 168.83

We have all the values to calculate their return:

originalInvestmentInEth = (1.5ETH + 250DAI) = 3 ETH
returnInEth = (userLiquidityTokenBalance/ totalLiquiditytokens)( ethBalanceExchange + tokenBalanceExchange/tokenprice ) - originalInvestment
returnInEth = (2/100)(77 + 13000/168.83) - 3
returnInEth = 3.08 - 3 = 0.08

The pattern to pay attention to is that the exchange is always up to date with live numbers, but the user info is out of date, but can still be used to calculate their return based on the live exchange info.

And there is currently no good way to show this in subgraphs that are only triggered on users interacting with the protocol. Calculating this every block isn't a good idea because you need to do it for 1000's of users, all the time. A query time calculation works really well. The pattern is very similar in compound.

leoyvens commented 5 years ago

Rationale / Use Cases

See comments above for use cases.

Requirements

User must be able to define fields whose value is determined at runtime, by executing user-defined logic that has access to APIs such as store.get and contract calls. This logic will be defined in a WASM function, so that it synergizes with the existing support for WASM mappings. The input to this function will be the entity to which the field being resolved belongs, with all of the stored fields set. In the MVP, only scalar fields can be custom resolved, not entity fields.

Proposed User Experience

A resolverModule field is added as a subfield of schema in the manifest, it specifies a WASM file just as we currently do for mappings. In the GraphQL schema, an entity with a custom-resolved field looks like:

type Entity {
    attr1: OtherEntity,
    attr2: BigInt!,
    field: Int! @resolvedBy(func: "resolverName", args: ["attr1", "attr2"]) 
}

where resolverName is a function exported by resolverModule and the list provided in args are the fields of Entity which will be provided as input to resolverName, that function would have the signature:

export function resolverName(entity: Entity): Value

it takes the generated entity type as an argument, the same one we generate for the schema (though not all fields will be present, only those specified in args) and returns a store value. That function has access to the same imports from the host as mappings, except those that mutate the store. At least for the MVP, ipfs.cat is also forbidden, none of the current use cases require it and the semantics of an IPFS timeout during a query are not obvious.

Limitations:

Other custom-resolved cannot be specified in args.
For a consumer of the subgraph, querying a custom-resolved field is syntatically the same as querying any other field. However custom-resolved fields cannot be filtered or ordered by. After the MVP we could implement the filter operations, though they would be fundamentally slow because custom-resolved fields cannot be indexed.
store.get will not return custom-resolved fields, not for the MVP and probably not ever. It would be bad for performance, could throw us into an infinte loop and is just unecessary since the user can call the custom resolver directly.

Proposed Implementation

graph-node: Needs to accept the new manifest field and the schema directive. The manifest field is added to the subgraph of subgraphs. The important changes are in the graphql resolver logic. When an object is being resolved, we make sure to fetch the args fields in addition to the fields requested in the query, then custom-resolved fields are executed with all necessary data available.

The resolver checks for the directive, and if present it will instantiate a WasmiModule and execute the custom resolver by passing in an entity with the args fields set. The returned store value is then converted to a query value and coerced. Parallel resolving is something the design allows, but will not be done for the MVP.

To instantiate the WASM module the first time, the resolver must query the IPFS hash of the resolver module from the subgraph of subgraphs, fetch it from IPFS and create a ValidModule. The resolver will keep an LRU cache of the last 100 ValidModules to speed up instantiation.

graph-cli: The new manifest field will be accepted and the WASM file uploaded to IPFS as we do for mappings. No special validation will be done for the MVP.

graph-ts: No changes required.

Proposed Documentation Updates

The manifest and schema changes will be documented in the appropriate places. A new section is added to "Define a subgraph" called "Custom resolvers" which documents example usage of the feature.

Proposed Tests / Acceptance Criteria

It should be possible to write a comprehensive test for this feature as a graph-node unit test. We should also prototype one of the real use cases before merging.

Tasks

[ ] (0.5d) cli: Accept new manifest field, compile and upload the WASM file to IPFS.
[ ] (0.5d) node: Parse new manifest field, store it in the subgraph of subgraphs.
[ ] (1d) node/cli: Validation, check that the function exists in the file and that the args fields exist in the entity. This could be in the node or the cli, ideally both. I'm more confortable with the node codebase, but if @Jannis has the bandwidth to give a hand we could do it in the cli.
[ ] (2d) graph-node: Update WasmiModule so that it can execute a custom resolver.
[ ] (2d) graph-node: Update resolver to be able to instantiate and execute the WASM modules, pass in the input and extract the result.
[ ] (2d) graph-node: Write unit test for custom resolvers.

davekaj commented 5 years ago

looks good on my end for what is required/desirable for the functionality !

lutter commented 5 years ago

@leodasvacas looks good to me. Custom resolvers require that we do a post-processing step after we query the database, i.e., we transform the user's query into one that only mentions data we actually store, get that data from the database, and then execute a resolver over that data that runs user-supplied functions.

User-supplied functions can be slow, and at the very least we should give users a way to understand whether a query is slow because getting data from the store is slow, or because user-supplied functions are slow
It would be better to not make the arguments to user-supplied functions the entire entity, and rather have the user indicate which attributes to pass into the function, something like an annotation @resolvedBy(func: "resolverName", args: ["attr1", "attr2", ..]) That way, when we get data from the store, we can replace field with the args the user mentioned. The reason for that is that some entities are big, and fetching the entire entity if big attributes aren't needed introduces unnecessary overhead

One other thought: this functionality doesn't really change what a subgraph is, but with the current design, changing a resolver function would result in an entirely new subgraph. What about defining these functions in the query itself? What I am not clear on for that is how the user would get WASM code into their queries.

lutter commented 5 years ago

One other thing: we could also go with something like this:

type Entity {
    attr1, .., attrN
    field: Int! @resolvedBy(func: "resolverName") 
}

and then in queries have users write

entities {
  attr1
  field { attr2 attr3 }
}

which would cause the function for field to be called with a JSON value representing { attr2 attr3 }.

The query we run to fetch data from the store would simply be

entitites {
  attr1 attr2 attr3
}

The thing that's a little mindbending is that the parent object for resolving { attr2 attr3 } would be the entity, i.e. descending into field does not change the parent object as it does for other relations. It would make it possible to pass the result of more complex subqueries into the resolver function though.

In general, it would be good to disallow calling store.get in the resolver function since that can easily lead to N+1 query problems; I think allowing subqueries as arguments to user-supplied functions would greatly reduce the need for store.get.

leoyvens commented 5 years ago

@lutter Thanks for the review!

Custom resolvers require that we do a post-processing step after we query the database

Yes, just like graphql coercions, how big of a change this will be depends on whether we'll have coercions on the first implementation of SQL resolvers or not. I left a comment on the SQL resolvers issue about this.

Also I realized a flaw in my implementation plan (I'll update it), we can't just re-use the result map because the inputs to the resolver may very well not be present in the query, so we'll need to modify the query to include those inputs. This complication makes me wonder if we should block this on SQL query combination.

User-supplied functions can be slow

Good point, we should differentiate the time taken by custom resolvers when logging the query time.

It would be better to not make the arguments to user-supplied functions the entire entity.

I totally agree, I'll update the design to include the args parameter. Do you think it can be optional, so by default we return the entire entity? In that case it could be added after the initial implementation. This does relate to the sub-entity question, see the end of this comment.

This functionality doesn't really change what a subgraph is

I agree that adding a custom resolved field is simply a convenience and is 100% backwards compatible for a subgraph. However this does change the capabilities of subgraph and should change the ID. It's a waste to re-index the subgraph because of this, but there is a more general solution which is data source de-duplication.

What about defining these functions in the query itself?

Consumers can already do whatever calculation they want on top of the data, if this were moved into queries the re-use and abstraction value would be lost.

and then in queries have users write

I don't see the value in having inputs written in queries when the query doesn't have any control over them.

I think allowing subqueries as arguments to user-supplied functions would greatly reduce the need for store.get.

That would be a nice optimization, but I think we can do it backwards-compatibly by having store.get return a pre-fetched object rather than hitting the store.

leoyvens commented 5 years ago

Conclusions from our meeting on this (@Jannis @lutter do say if I forgot an important point):

We should include more validations, such as if the function actually exists in the WASM module, be it in graph-cli or graph-node, preferably both.
The args argument should be mandatory and implemented now, so we encourage people to consider what fields they actually need so the query is more performant.
We will not pursue store.get optimizations (yet) or forbid it, specially considering we'll allow contract calls which are much slower.
If a custom resolved field fails to resolve, we should fail gracefully and return a partial result.
This and SQL query combination interact with each other but don't block each other, we can work in parallel.
TImings for custom resolved field should be logged separetly.

I've updated the plan and tried to break up the estimates more, and give myself more time. The detail level in the tasks is not great but I'm at the limit of my capacity to plan and estimate ahead.

leoyvens commented 5 years ago

A design issue that wasn't covered in the plan is what ABIs the resolver module will have access to for the contract calls. I see two options:

The resolver specifies its own ABIs. I'm not a fan of this because it puts ethereum specific information in the schema section, and that ABI is probably duplicated from a data source anyways. The manifest would look like:
```
- schema:
- file: ./schema.graphql
- ethereumAbis:
  - name: Foo
  - file: ./abis/Foo.json
```
The resolver can implicitly access any ABI defined in the manifest. This is my preference, it doesn't require changes to the manifest and the resolver can import any generated contract class and it will just work. The concern is if there are two ABIs with the same name pointing to different files. We should detect and warn on this case, it doesn't seem worth supporting and it's plausible that in the future we'll change the manifest structure so that all ABIs are in fact in one place.

davekaj commented 5 years ago

I believe we are prioritizing this for the compound subgraph. @Jannis @leoyvens

I imagine this will have to get queued up for engineering priorities?

davekaj commented 5 years ago

@Zerim Did you have thoughts on how to implement this in the past?

Zerim commented 5 years ago

As I mentioned in Slack my main interest here was that the interface for custom resolvers be implemented in the client-side query engine rather than adding additional complexity to the graph node.

I think the resolvers could be written in WASM, but they could also simply be written in Javascript (ES5), since it will be running on the end user's machine and that is what the query engine is transpiled to.

This would be related to the work of decoupling the query engine from the graph node and making the main query interface for the graph node use the read interface that @lutter has been designing for the hybrid network.

I only scanned the conversation above, but one piece of feedback is that I would like to keep the data model (GraphQL schema) free from concerns around how those fields are computed, indexed, etc. etc. That information should go elsewhere in the manifest or mappings.

un7c0rn commented 3 years ago

Following up from a Discord thread, this seems very useful. The case we have is as follows:

Royalties are deposited into a contract address over a time period. There are multiple rightsholders that are entitled to these royalties –– each in proportion to the number of ERC-20 tokens they hold.

When a rightsholder attempts to withdraw royalties, the computation is essentially:

withrawal_amount = sum_over_time(tokens_received - tokens_sent) * sum(royalties) / total_number_of_tokens

This computation is infeasible on-chain so instead we use a graph node to index the transfer events. The problem is that there's no native support in GraphQL to perform the filtered sum operation. Implementing this in the form of a query engine means we need a 3rd component in the architecture, e.g. an off chain service just to execute the query, when really what we want is to execute the query from the chain and to have a query time computation done on the node.

reubenr0d commented 1 year ago

+1, this feature would be really useful. But this issue seems stale. Are there any plans of implementing this?

graphprotocol / graph-node