moonstream-to / api

Building blocks for your blockchain economy
https://moonstream.to
Apache License 2.0
141 stars 49 forks source link

States crawler #655

Open Andrei-Dolgolev opened 2 years ago

Andrei-Dolgolev commented 2 years ago

Existing problems

1: biologist crawler

image

Currently biologist have 2 main parts bisedes calculation it interacting with QueryAPI and Blockchain.

If we will have state of view method in database we can remove requirements interacting with Blockchain and Multicall contracts.

2: Opensea of locked assets

Currently for be able get understanding if asset is locked they need call method if unicorn is locked.

If we have assets view method state we can get understanding if asset locked or not it can good api for users.

3: Get stats of NFTS

Usual you need write crawler of metadata url and renew current metadata state.

image

Suggetion of state crawler version 1 (crawl repeat by interval, 1 blockchain, brownie)

Crawling tasks:

Simple task

get Total supply:

{
    "type": "function",
    "name": "totalSupply",
    "outputs": [
        {
            "internalType": "uint256",
            "name": "",
            "type": "uint256",
        }
    ],
    "address": "0xdC0479CC5BbA033B3e7De9F178607150B3AbCe1f",
    "inputs": [], # it describe inputs for interface(regular abi generation and encoding and decoding call data)
     "value":[] # it describe inputs for crawler
}

Task have address and all requred data fro generate Contract interface for make call cross Multicall2 and decode output.

Value: Can be:

Because we have subdependency between input and output of different view methods(see case 1):

Complex task

nested structure:

For repeat logic of task from case 1:

Required chain of calls:

totalSupply -> getDNA -> getUnicornBodyParts

Because getDNA need range of current tokens and getUnicornBodyParts required DNA as input.

{
        "name": "getUnicornBodyParts",
        "address": "0xdC0479CC5BbA033B3e7De9F178607150B3AbCe1f",
        "type": "function",
        "inputs": [
            {
                "name": "_dna",
                "type": "uint256",
                "internalType": "uint256",
                "value": {
                    "type": "function",
                    "inputs": [
                        {
                            "internalType": "uint256",
                            "name": "_tokenId",
                            "type": "uint256",
                            "value": {
                                "type": "function",
                                "name": "totalSupply",
                                "outputs": [
                                    {
                                        "internalType": "uint256",
                                        "name": "",
                                        "type": "uint256",
                                    }
                                ],
                                "address": "0xdC0479CC5BbA033B3e7De9F178607150B3AbCe1f",
                                "inputs": [],
                            },
                        }
                    ],
                    "name": "getDNA",
                    "outputs": [
                        {"internalType": "uint256", "name": "", "type": "uint256"}
                    ],
                    "address": "0xdC0479CC5BbA033B3e7De9F178607150B3AbCe1f",
                },
            }
        ],
        "outputs": [
            {"internalType": "uint256", "name": "bodyPartId", "type": "uint256"},
            {"internalType": "uint256", "name": "facePartId", "type": "uint256"},
            {"internalType": "uint256", "name": "hornPartId", "type": "uint256"},
            {"internalType": "uint256", "name": "hoovesPartId", "type": "uint256"},
            {"internalType": "uint256", "name": "manePartId", "type": "uint256"},
            {"internalType": "uint256", "name": "tailPartId", "type": "uint256"},
            {"internalType": "uint8", "name": "mythicCount", "type": "uint8"},
        ],
    }

For resolving that nesting we can use next algorithm:

We parse tasks as on picture: image

How store

As suggest @zomglings just put it in same labels table.

label_name = "view-state"

label = label_model(
        label=label_name,
        label_data={
            "type": "view",
            "name": call["name"],
            "result": call["result"],
            "inputs": call["inputs"],
            "status": call["status"],
        },
        address=call["contract_address"],
        block_number=call["block_number"],
        transaction_hash=None,
        block_timestamp=call.block_timestamp,
    )

Inputs and outputs decoding

In moonworm crawler we have decoder of transactions wich parse output to dictionary.

We need do same for args of inputs and output parameters with provided abi

Currently multicall contractract return us a tuple.

Sometime abi not have names for output and inputs:

    "inputs": [
        {
            "internalType": "uint256",
            "name": "_tokenId",
            "type": "uint256",
            "value": "af4ac3c0fa304befcc08b293f35d8d95"
        }
    ],
    "level": 1,
    "name": "getDNA",
    "outputs": [
        {
            "internalType": "uint256",
            "name": "",
            "type": "uint256"
        }
    ],
    "stateMutability": "view",
    "type": "function"

but sometime it required for difficult output as on example from task upper:

....
            {"internalType": "uint256", "name": "bodyPartId", "type": "uint256"},
            {"internalType": "uint256", "name": "facePartId", "type": "uint256"},
            {"internalType": "uint256", "name": "hornPartId", "type": "uint256"},
            {"internalType": "uint256", "name": "hoovesPartId", "type": "uint256"},
            {"internalType": "uint256", "name": "manePartId", "type": "uint256"},
            {"internalType": "uint256", "name": "tailPartId", "type": "uint256"},
            {"internalType": "uint8", "name": "mythicCount", "type": "uint8"},
....
zomglings commented 2 years ago

About "inputs" and "value":

  1. "inputs" -> "input_schema"
  2. "value" -> "inputs"
zomglings commented 2 years ago

Algorithm to calculate order of tasks is basically a topological sort of vertices in a directed, acyclic graph.

We have an implementation in our codebase here: https://github.com/bugout-dev/shnorky/blob/a1948fa8299677105cb9e80140ad5c44c2131bfc/flows/specification.go#L125

You should not make a distinction between WithSubcalls and WithoutSubcalls. Just resolve everything into levels the same way. The tasks that don't have subcalls automatically go to level 0.

zomglings commented 2 years ago

About "inputs" and "value":

  1. "inputs" -> "input_schema"
  2. "value" -> "inputs"

"outputs" should also be called "output_schema"

zomglings commented 2 years ago

I don't like how complex tasks are nested like this - what if we want to use output of totalSupply in multiple steps? I think it's best to associate an ID with each task and specify that output of a task can be used as input by any other task (as long as it doesn't cause cycles in the execution graph).

zomglings commented 2 years ago

We need to also specify how output of one step can be used as input of next step. totalSupply output is a single number whose output is used as upper bound on loop (in biologist).

But there could also be cases where a step produces a list of items as an output and we want to map some call over the entire list (e.g. in the future we will do token URIs -> curl).

zomglings commented 2 years ago

@Andrei-Dolgolev : I am okay even if we don't have a full dependency semantics to connect inputs to outputs in version 1. Only thing we need to think about carefully is whether our configuration schema can be extended easily to support those kinds of relationships in the future.

zomglings commented 2 years ago

Also, another thought: we are really talking about programming here -- map operation over output of previous step, loop over output of previous step, conditional execution based on output of previous step.

Maybe instead of using JSON we should define these pipelines as Python or JS scripts?

This is how Apache Airflow works: https://airflow.apache.org/docs/apache-airflow/stable/tutorial.html

Andrei-Dolgolev commented 2 years ago

One moment about: inputs: "inputs" -> "input_schema" "value" -> "inputs"

outputs: "output" -> "output_schema" "value" -> "output"

I agree rename value but "output" reserved, in genral now for generate from all sets of tasks full set of required contract interfaces it why task schema mainly extended ABI because it was simple dedublicate calls results and requires contracts call like that.

Andrei-Dolgolev commented 2 years ago

I don't like how complex tasks are nested like this - what if we want to use output of totalSupply in multiple steps? I think it's best to associate an ID with each task and specify that output of a task can be used as input by any other task (as long as it doesn't cause cycles in the execution graph).

It exacly how it's working for task you create hash and dict responses is checked object for now and whenever any of tree element pointed to that hash you get already existing response.