Developer experience feedback (writing and testing modules)

fubhy commented 1 year ago

Oy! Anologous to the other issue (https://github.com/streamingfast/substreams/issues/122), I'm posting an unfiltered list of general feedback for all sorts of substream module development experience related things here.

Work in progress ...

Testing

This is a no-brainer and likely already on your radar but I'll put it here anyways. We should provide a test harness for easy, native integration testing within the cargo test runner.
This would also mean we'd be able to step debug our modules properly. An external test runner would likely make this very difficult. WASM is seemingly pretty much a wasteland otherwise when it comes to remote debugging and the likes thereof.
We should be able to provide a set of test macros and utilities.
Utilities for testing should also include tools to generate test fixtures from real data. E.g. a (or a series of) actual firehose block(s). Store state, map output, etc. to be used in tests for modules using those as inputs.
For instance, the substreams run command, in aforementioned interactive mode, could also provide an option to "download module input fixtures & read store keys" for the block at which it failed. This would then scaffold a test scenario and immediately let you step debug the failing scenario. That'd be dope.

Module handlers

We should make it easier (and more explicit) to “return nothing” from a mapper. Therefore, we should imho, in addition to Result<T>, allow Option<T> and Result<Option<T>, ...> or straight up T (where T is a protobuf message) to be returned from a mapper.

Tracked by https://github.com/streamingfast/substreams-rs/issues/11
We should support error propagation (via ? chaining). The expected type of substreas::error::Error prevents that atm.

Trakced by https://github.com/streamingfast/substreams-rs/issues/14

Store abstractions

The current Appender implementation doesn't seem to be very robust. For which use-case would I want to maintain a ; delimineted string? Collections of arbitrary (but fixed) length byte sequences seem to be much more useful.
Implementing the equivalent of "dynamic data sources" in substreams is rather unwieldy atm. despite that likely being one of the most common use-cases for stateful streams.
There's no bespoke abstraction for this usecase (e.g. an AddressSetStore [set as in "unique list"]).
I'd expect such a store's consumer facing data type to be immediately compatible with things like ::events::<T>(addresses) from Abigen with implicit type conversion, or not type conversion at all.
I'd also expect the consumer facing side of the store support a wider range of key types (very often I want to set an address or tx hash (bytes) or a timestamp as a key).

Well-known-types

It might make sense to provide a collection of WKTs(well-known-types) for some commonly used data types: BigDecimal, BigInt, Address, Hash (of some kind) etc.
My eyes bleed every time I see something like this:

message Foo {
  string token_balance = 1;
  string token_decimals = 2;
}

Eth calls

The eth call abstractions could be much more ergonomic both for single calls and batched calls.
Manually encoding & wrapping my calldata in an RpcCall struct and then wrapping it in another RpcCalls struct to then send it off is not very ergnomic.
For a single call, this should do the trick:

let fn = erc20::functions::Name {};
let name: String = fn.call(address).unwrap();

For a batched call, maybe something like this:

let calls = RpcCalls::from(vec![
  erc20::functions::Name { ... }.encode_call(address),
  erc20::functions::Symbol { ... }.encode_call(address),
]).call();

The generated function structs (from Abigen) should provide the necessary traits & implementations to conveniently decode the call results
For syntactic sugar, especially for functions that don't take arguments, we could provide static methods:

let name: String = erc20::functions::Name::call(address).unwrap();

Call results should come as Result<T> instead of with a bool field for failed (or is there a reason for that that I'm missing?).

fubhy commented 1 year ago

You might think that I'm unreasonably and prematurely concerned about runtime performance here but with "dynamic data sources" being such a common pattern, I'd think we should provide a highly efficient solution for that.

Due to the lack of a dedicated pattern for that currently, users might be compelled to create their own abstraction...

They might just use the Set store and call the hammer the WASM binding (store.get_at()) on every iteration of blk.logs().

Or they might do a proto store for that with a repeated bytes field and then simply use .contains(address) on every iteration of blk.logs() for filtering logs. Not great (O(n^2)). Imagine that list grows to 10.000 addresses.

It'd already be significantly better if they'd store it in a map<string, bool> type message. That way, they'd at least be able to simply check the underlying HashMap's key efficiently.

I'd like to see some numbers for different approaches for differently sized lists here ...

So, let's provide dedicated store abstractions for such use cases. Once we have a dedicated, bespoke "AddressList" store abstraction, we can still always optimize and change the underlying implementation as long as the public interface remains the same. Hence, we could indeed implement it around a simple map<string, bool> proto field internally for now.

Whether we think that this constitutes premature optimization or not, dedicated abstractions with bespoke public interfaces make sense either way imho. If there's a need to optimize it further in the future, we can do that in the underlying implementaton. If it turns out that there's no need for that then we've at least gained a better DX from the bespoke abstraction: Less footguns for users by accidentally using low level abstractions incorrectly or inefficiently, and more compatible interfaces for use in the substreams modules (e.g. compatible ::events() fn parameters that support our store struct / HashMap in this case).

I think it's worthwhile to research and enumerate all those common patterns and provide "Standards Library" (substreams-std)-style abstractions for all of them.

maoueh commented 1 year ago

About Eth calls, today the current code for a single call is:

    let supports_interface = abi::erc165::functions::SupportsInterface {interface_id: [0x00, 0x00, 0x00, 0x00]};
    let supported = supports_interface.call(address).unwrap();

Which is your suggestion, this was available when you posted the issue ... so it's solved for this part correct? (I'm not talking about the quick helper which I'll track as well as probably an improved RpcCalls ergonomic).

I think it notes you have much before writing the issue.

fubhy commented 1 year ago

About Eth calls, today the current code for a single call is:
    let supports_interface = abi::erc165::functions::SupportsInterface {interface_id: [0x00, 0x00, 0x00, 0x00]};
    let supported = supports_interface.call(address).unwrap();
Which is your suggestion, this was available when you posted the issue ... so it's solved for this part correct? (I'm not talking about the quick helper which I'll track as well as probably an improved RpcCalls ergonomic).

Yes, badly worded. That was just a primer for the subsequent point about batch calls not being very ergonomic "this is how single calls look, maybe we can do this for batch calls".

fubhy commented 1 year ago

What do you mean with "I think it notes you have much before writing the issue."?

maoueh commented 1 year ago

What do you mean with "I think it notes you have much before writing the issue."?

I was thinking maybe that it was notes you had gathered before writing the issue.

About the RpcCalls then. For the input part, it think today way of doing things is quite similar to what you propose:

let calls = RpcBatch::new()
            .add(erc20::functions::Fun1 { .. }, address)
            .add(erc20::functions::Fun2 { .. }, address)
            .execute()

We could easily add indeed a slightly easier form either what you propose with encode_call addition or a variation that would not require extra generated code:

let calls = RpcBatch::from(vec![
            (erc20::functions::Fun1 { .. }.encode(), address),
            (erc20::functions::Fun2 { .. }.encode(), address),
 ]).execute()

But I'm not sure there is a user experience gain here.

One of the difficulty (impossibility) of making the API nicer here is that calls and the decoded results are struct of different types, so you get to all sort of Rust problem around typings when you try to decode them all together into a single "calls" entity. I don't think we can a list of mixed results types.

Only thing I could see would be some kind of macro that would expand/decode the results for you into a tuple of N elements, one per call. But I think for that, a full specialized RpcBatch would need to be generated.

Hence why we couldn't offer directly what you are proposing for resulting types.

maoueh commented 1 year ago

We should support error propagation (via ? chaining). The expected type of substreams::error::Error prevents that atm.

If we had conversion for most standard error, String and support for anyhow::Error, I imagine it will enable error propagation in almost very cases, and cases failing to compile could workaround by wrapping their error with anyhow.

Does that sound good to you?

maoueh commented 1 year ago

We could maybe also change the return type directly to anyhow::Error, would probably make the life of everyone easier, we use the error only for user display anyway as a string, so we could probably change it to this.

fubhy commented 1 year ago

About the RpcCalls then. For the input part, it think today way of doing things is quite similar to what you propose:
let calls = RpcBatch::new()
            .add(erc20::functions::Fun1 { .. }, address)
            .add(erc20::functions::Fun2 { .. }, address)
            .execute()

Oh, that's good. I wasn't aware of that option. That looks absolutely fine in terms of ergonomics.

One of the difficulty (impossibility) of making the API nicer here is that calls and the decoded results are struct of different types, so you get to all sort of Rust problem around typings when you try to decode them all together into a single "calls" entity. I don't think we can a list of mixed results types.

Gotcha. That makes sense.

If we had conversion for most standard error, String and support for anyhow::Error, I imagine it will enable error propagation in almost very cases, and cases failing to compile could workaround by wrapping their error with anyhow.

Yes. In a recent discussion with Robin (Messari), we also wondered why we don't use panic in place in case of any error? Why do we even need to bubble errors up to the store/map macro in the first place? If there's no reason for that, we could also just make the return type of map either T or Option<T> only?

maoueh commented 1 year ago

Yes. In a recent discussion with Robin (Messari), we also wondered why we don't use panic in place in case of any error? Why do we even need to bubble errors up to the store/map macro in the first place? If there's no reason for that, we could also just make the return type of map either T or Option only?

Yeah I'm planning to do that, allow non-Result return.

Using Result would help with error propagation and chaining, it also feels nice in lot of cases to simply use ? when needed instead of except. It's also expected code for people used to Rust. Definitely agree that error propagation should work properly.

maoueh commented 1 year ago

For the RpcBatch, we could probably at least improve the decoding experience a bit by offering decode_at on the RpcResponses directly:

let responses = RpcBatch::new().add(...).execute()
let name = responses.decode_at::<Name>(0).unwrap();
let decimals = responses.decode_at::<Decimals>(1).unwrap();
...

fubhy commented 1 year ago

Using Result would help with error propagation and chaining, it also feels nice in lot of cases to simply use ? when needed instead of except. It's also expected code for people used to Rust. Definitely agree that error propagation should work properly.

Wonderful. That's definitely my preferred option too for those reasons.

I don't know nearly enough about the module execution dynamics in the host environment but I imagine that with Option<T>, it would be easier for the host to properly infer intent of a module when it explicitly returns nothing from a map. That, in turn, could then mean that we don't have to invoke downstream modules (store and maps) whenever there's actually no input. Currently, all downstream maps and stores are called for every single clock tick even if none of the inputs holds any data.

fubhy commented 1 year ago

Currently, all downstream maps and stores are called for every single clock tick even if none of the inputs holds any data.

And that's actually also a DX problem, because it means that downstream modules often have to check foo.bar.size() and exit early. Unnecessary boilerplate imho.

pyk commented 8 months ago

hey guys, is there any update on how to test substreams?

I want to test some small lib to fetch ERC20 metadata but it throws an error:

use crate::abi::{ERC20Bytes, ERC20};
use crate::pb::uniswap::v2::Token;
use substreams::scalar::BigInt;
use substreams::{log, Hex};
use substreams_ethereum::rpc::RpcBatch;

pub fn get_token(address: &String) -> Option<Token> {
    let batch = RpcBatch::new();
    let token_address = Hex::decode(address).unwrap();
    let responses = batch
        .add(ERC20::functions::Decimals {}, token_address.clone())
        .add(ERC20::functions::Name {}, token_address.clone())
        .add(ERC20::functions::Symbol {}, token_address.clone())
        .execute()
        .unwrap()
        .responses;

    let decimals_decoder = RpcBatch::decode::<BigInt, ERC20::functions::Decimals>(&responses[0]);
    let decimals = match decimals_decoder {
        Some(value) => value.to_u64(),
        None => {
            log::debug!("get_token {} have invalid decimals", address);
            return None;
        }
    };

    let name_decoder = RpcBatch::decode::<String, ERC20::functions::Name>(&responses[1]);
    let name = match name_decoder {
        Some(value) => value,
        None => {
            let name_bytes_caller = ERC20Bytes::functions::Name {};
            match name_bytes_caller.call(token_address.clone()) {
                Some(name_bytes) => Hex::encode(name_bytes),
                None => {
                    log::debug!("get_token {} have invalid name", address);
                    String::new()
                }
            }
        }
    };

    let symbol_decoder = RpcBatch::decode::<String, ERC20::functions::Symbol>(&responses[2]);
    let symbol = match symbol_decoder {
        Some(value) => value,
        None => {
            let symbol_bytes_caller = ERC20Bytes::functions::Symbol {};
            match symbol_bytes_caller.call(token_address.clone()) {
                Some(symbol_bytes) => Hex::encode(symbol_bytes),
                None => {
                    log::debug!("get_token {} have invalid symbol", address);
                    String::new()
                }
            }
        }
    };

    Some(Token {
        address: Hex::encode(address),
        name,
        symbol,
        decimals,
    })
}

#[cfg(test)]
mod tests {
    use crate::{pb::uniswap::v2::Token, rpc::get_token};

    #[test]
    fn it_works() {
        let result = get_token(&"0x00".to_string()).unwrap();
        assert_eq!(
            result,
            Token {
                address: "".to_string(),
                name: "".to_string(),
                symbol: "".to_string(),
                decimals: 0
            }
        );
    }
}

error:

running 1 test
thread 'rpc::tests::it_works' panicked at /Users/pyk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/substreams-ethereum-core-0.9.9/src/rpc.rs:71:5:
not implemented: this method is not implemented outside of 'wasm32' target compilation

streamingfast / substreams