streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
159 stars 45 forks source link

Developer experience feedback (writing and testing modules) #123

Open fubhy opened 1 year ago

fubhy commented 1 year ago

Oy! Anologous to the other issue (https://github.com/streamingfast/substreams/issues/122), I'm posting an unfiltered list of general feedback for all sorts of substream module development experience related things here.

Work in progress ...

Testing

Module handlers

Store abstractions

Well-known-types

message Foo {
  string token_balance = 1;
  string token_decimals = 2;
}

Eth calls

let fn = erc20::functions::Name {};
let name: String = fn.call(address).unwrap();
let calls = RpcCalls::from(vec![
  erc20::functions::Name { ... }.encode_call(address),
  erc20::functions::Symbol { ... }.encode_call(address),
]).call();
let name: String = erc20::functions::Name::call(address).unwrap();
fubhy commented 1 year ago

You might think that I'm unreasonably and prematurely concerned about runtime performance here but with "dynamic data sources" being such a common pattern, I'd think we should provide a highly efficient solution for that.

Due to the lack of a dedicated pattern for that currently, users might be compelled to create their own abstraction...

They might just use the Set store and call the hammer the WASM binding (store.get_at()) on every iteration of blk.logs().

Or they might do a proto store for that with a repeated bytes field and then simply use .contains(address) on every iteration of blk.logs() for filtering logs. Not great (O(n^2)). Imagine that list grows to 10.000 addresses.

It'd already be significantly better if they'd store it in a map<string, bool> type message. That way, they'd at least be able to simply check the underlying HashMap's key efficiently.

I'd like to see some numbers for different approaches for differently sized lists here ...

So, let's provide dedicated store abstractions for such use cases. Once we have a dedicated, bespoke "AddressList" store abstraction, we can still always optimize and change the underlying implementation as long as the public interface remains the same. Hence, we could indeed implement it around a simple map<string, bool> proto field internally for now.


Whether we think that this constitutes premature optimization or not, dedicated abstractions with bespoke public interfaces make sense either way imho. If there's a need to optimize it further in the future, we can do that in the underlying implementaton. If it turns out that there's no need for that then we've at least gained a better DX from the bespoke abstraction: Less footguns for users by accidentally using low level abstractions incorrectly or inefficiently, and more compatible interfaces for use in the substreams modules (e.g. compatible ::events() fn parameters that support our store struct / HashMap in this case).

I think it's worthwhile to research and enumerate all those common patterns and provide "Standards Library" (substreams-std)-style abstractions for all of them.

maoueh commented 1 year ago

About Eth calls, today the current code for a single call is:

    let supports_interface = abi::erc165::functions::SupportsInterface {interface_id: [0x00, 0x00, 0x00, 0x00]};
    let supported = supports_interface.call(address).unwrap();

Which is your suggestion, this was available when you posted the issue ... so it's solved for this part correct? (I'm not talking about the quick helper which I'll track as well as probably an improved RpcCalls ergonomic).

I think it notes you have much before writing the issue.

fubhy commented 1 year ago

About Eth calls, today the current code for a single call is:

    let supports_interface = abi::erc165::functions::SupportsInterface {interface_id: [0x00, 0x00, 0x00, 0x00]};
    let supported = supports_interface.call(address).unwrap();

Which is your suggestion, this was available when you posted the issue ... so it's solved for this part correct? (I'm not talking about the quick helper which I'll track as well as probably an improved RpcCalls ergonomic).

Yes, badly worded. That was just a primer for the subsequent point about batch calls not being very ergonomic "this is how single calls look, maybe we can do this for batch calls".

fubhy commented 1 year ago

What do you mean with "I think it notes you have much before writing the issue."?

maoueh commented 1 year ago

What do you mean with "I think it notes you have much before writing the issue."?

I was thinking maybe that it was notes you had gathered before writing the issue.

About the RpcCalls then. For the input part, it think today way of doing things is quite similar to what you propose:

let calls = RpcBatch::new()
            .add(erc20::functions::Fun1 { .. }, address)
            .add(erc20::functions::Fun2 { .. }, address)
            .execute()

We could easily add indeed a slightly easier form either what you propose with encode_call addition or a variation that would not require extra generated code:

let calls = RpcBatch::from(vec![
            (erc20::functions::Fun1 { .. }.encode(), address),
            (erc20::functions::Fun2 { .. }.encode(), address),
 ]).execute()

But I'm not sure there is a user experience gain here.

One of the difficulty (impossibility) of making the API nicer here is that calls and the decoded results are struct of different types, so you get to all sort of Rust problem around typings when you try to decode them all together into a single "calls" entity. I don't think we can a list of mixed results types.

Only thing I could see would be some kind of macro that would expand/decode the results for you into a tuple of N elements, one per call. But I think for that, a full specialized RpcBatch would need to be generated.

Hence why we couldn't offer directly what you are proposing for resulting types.

maoueh commented 1 year ago

We should support error propagation (via ? chaining). The expected type of substreams::error::Error prevents that atm.

If we had conversion for most standard error, String and support for anyhow::Error, I imagine it will enable error propagation in almost very cases, and cases failing to compile could workaround by wrapping their error with anyhow.

Does that sound good to you?

maoueh commented 1 year ago

We could maybe also change the return type directly to anyhow::Error, would probably make the life of everyone easier, we use the error only for user display anyway as a string, so we could probably change it to this.

fubhy commented 1 year ago

About the RpcCalls then. For the input part, it think today way of doing things is quite similar to what you propose:

let calls = RpcBatch::new()
            .add(erc20::functions::Fun1 { .. }, address)
            .add(erc20::functions::Fun2 { .. }, address)
            .execute()

Oh, that's good. I wasn't aware of that option. That looks absolutely fine in terms of ergonomics.

One of the difficulty (impossibility) of making the API nicer here is that calls and the decoded results are struct of different types, so you get to all sort of Rust problem around typings when you try to decode them all together into a single "calls" entity. I don't think we can a list of mixed results types.

Gotcha. That makes sense.

If we had conversion for most standard error, String and support for anyhow::Error, I imagine it will enable error propagation in almost very cases, and cases failing to compile could workaround by wrapping their error with anyhow.

Yes. In a recent discussion with Robin (Messari), we also wondered why we don't use panic in place in case of any error? Why do we even need to bubble errors up to the store/map macro in the first place? If there's no reason for that, we could also just make the return type of map either T or Option<T> only?

maoueh commented 1 year ago

Yes. In a recent discussion with Robin (Messari), we also wondered why we don't use panic in place in case of any error? Why do we even need to bubble errors up to the store/map macro in the first place? If there's no reason for that, we could also just make the return type of map either T or Option only?

Yeah I'm planning to do that, allow non-Result return.

Using Result would help with error propagation and chaining, it also feels nice in lot of cases to simply use ? when needed instead of except. It's also expected code for people used to Rust. Definitely agree that error propagation should work properly.

maoueh commented 1 year ago

For the RpcBatch, we could probably at least improve the decoding experience a bit by offering decode_at on the RpcResponses directly:

let responses = RpcBatch::new().add(...).execute()
let name = responses.decode_at::<Name>(0).unwrap();
let decimals = responses.decode_at::<Decimals>(1).unwrap();
...
fubhy commented 1 year ago

Using Result would help with error propagation and chaining, it also feels nice in lot of cases to simply use ? when needed instead of except. It's also expected code for people used to Rust. Definitely agree that error propagation should work properly.

Wonderful. That's definitely my preferred option too for those reasons.

I don't know nearly enough about the module execution dynamics in the host environment but I imagine that with Option<T>, it would be easier for the host to properly infer intent of a module when it explicitly returns nothing from a map. That, in turn, could then mean that we don't have to invoke downstream modules (store and maps) whenever there's actually no input. Currently, all downstream maps and stores are called for every single clock tick even if none of the inputs holds any data.

fubhy commented 1 year ago

Currently, all downstream maps and stores are called for every single clock tick even if none of the inputs holds any data.

And that's actually also a DX problem, because it means that downstream modules often have to check foo.bar.size() and exit early. Unnecessary boilerplate imho.

pyk commented 8 months ago

hey guys, is there any update on how to test substreams?

I want to test some small lib to fetch ERC20 metadata but it throws an error:

use crate::abi::{ERC20Bytes, ERC20};
use crate::pb::uniswap::v2::Token;
use substreams::scalar::BigInt;
use substreams::{log, Hex};
use substreams_ethereum::rpc::RpcBatch;

pub fn get_token(address: &String) -> Option<Token> {
    let batch = RpcBatch::new();
    let token_address = Hex::decode(address).unwrap();
    let responses = batch
        .add(ERC20::functions::Decimals {}, token_address.clone())
        .add(ERC20::functions::Name {}, token_address.clone())
        .add(ERC20::functions::Symbol {}, token_address.clone())
        .execute()
        .unwrap()
        .responses;

    let decimals_decoder = RpcBatch::decode::<BigInt, ERC20::functions::Decimals>(&responses[0]);
    let decimals = match decimals_decoder {
        Some(value) => value.to_u64(),
        None => {
            log::debug!("get_token {} have invalid decimals", address);
            return None;
        }
    };

    let name_decoder = RpcBatch::decode::<String, ERC20::functions::Name>(&responses[1]);
    let name = match name_decoder {
        Some(value) => value,
        None => {
            let name_bytes_caller = ERC20Bytes::functions::Name {};
            match name_bytes_caller.call(token_address.clone()) {
                Some(name_bytes) => Hex::encode(name_bytes),
                None => {
                    log::debug!("get_token {} have invalid name", address);
                    String::new()
                }
            }
        }
    };

    let symbol_decoder = RpcBatch::decode::<String, ERC20::functions::Symbol>(&responses[2]);
    let symbol = match symbol_decoder {
        Some(value) => value,
        None => {
            let symbol_bytes_caller = ERC20Bytes::functions::Symbol {};
            match symbol_bytes_caller.call(token_address.clone()) {
                Some(symbol_bytes) => Hex::encode(symbol_bytes),
                None => {
                    log::debug!("get_token {} have invalid symbol", address);
                    String::new()
                }
            }
        }
    };

    Some(Token {
        address: Hex::encode(address),
        name,
        symbol,
        decimals,
    })
}

#[cfg(test)]
mod tests {
    use crate::{pb::uniswap::v2::Token, rpc::get_token};

    #[test]
    fn it_works() {
        let result = get_token(&"0x00".to_string()).unwrap();
        assert_eq!(
            result,
            Token {
                address: "".to_string(),
                name: "".to_string(),
                symbol: "".to_string(),
                decimals: 0
            }
        );
    }
}

error:

running 1 test
thread 'rpc::tests::it_works' panicked at /Users/pyk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/substreams-ethereum-core-0.9.9/src/rpc.rs:71:5:
not implemented: this method is not implemented outside of 'wasm32' target compilation