vechain / thor

A general purpose blockchain highly compatible with Ethereum's ecosystem
GNU Lesser General Public License v3.0
799 stars 246 forks source link

⚙️ [CONFIG] - Increase default value for `api-logs-limit` #806

Open ifavo opened 1 month ago

ifavo commented 1 month ago

Description

In the next release, there is a new option to add a limit for returned event logs: https://github.com/vechain/thor/pull/777

The configuration sets a default limit to 1000 entries, which is very low when using public nodes to index data.

I propose increasing the default limit to allow clients to access enough events so that at least 100 (if not 1000) blocks can be scanned in a single request. This should enable most developers and users to access data sensibly without overloading server resources.

In my opinion, a block with 200 clauses (regardless of the transaction count) with 2 events each should be available without rejecting the request. This means for 100 blocks, it would be up to 20,000 events.

The nodes I run respond quickly with even more events on some requests. If someone is running a low-resource node, the limit can still be lowered.

otherview commented 1 month ago

I think having a limitation based on block ranges is an interesting flag to have. And it does seem possible to me to have a joint default limitation, perhaps something like: "events are defaulted to 1k entries with no block limit or 10k/100k entries if it's inside an interval of 100 blocks" ?

But I'm not sure about "using public nodes to index data" - typically one should run their own nodes to have this type of service.

ifavo commented 1 month ago

I think having a limitation based on block ranges is an interesting flag to have.

A limitation based on block ranges is what I encountered on several RPC providers, for example ankr. I would prefer it, because from a client side it can be better understood and managed, because all limits are under the control of the client. Where an output limit can only be guessed and is difficult to predict for the client side.

ÄI personally prefer the ability to paginate by result count, because it allows broader access to the dataset. eth_getLogs however does not support it, so compatibility with existing EVM-Tooling is the important factor I encounter here._

And it does seem possible to me to have a joint default limitation, perhaps something like: "events are defaulted to 1k entries with no block limit or 10k/100k entries if it's inside an interval of 100 blocks" ?

It sounds like a good idea. What about making two limits or using block ranges instead:

  1. logs limit (defaults to something that supports ~100 blocks)
  2. block range limit (defaults to 100 blocks)

But I'm not sure about "using public nodes to index data" - typically one should run their own nodes to have this type of service.

I agree that public nodes for indexing is not the best example use case, but then public infrastructure should support most applications & developers to launch without friction. Negative impact on the infrastructure must be avoided with limits and once an application gains traction, a custom node is always possible.


Regarding limits in general, I think Alchemy has three ones that we might use as inspiration:

  1. eth_getLogs requests with up to a 2K block range and no limit on the response size (most providers have 5-10 blocks)
  2. block range with a cap of 10K logs in the response (not seen somewhere else yet)
  3. a maximum of 150MB response size (usually 25MB)
otherview commented 1 month ago

Hey @ifavo

I was doing some, back-of-the-envelope math and these numbers are starting to look pretty big. Following these assumptions:

Event Gas Cost : 1000 gas
Block Gas limit: 40M
Number of blocks in range: 200

Max Total No Events = Block Gas limit / Event Gas Cost  * Number of blocks
8_000_000 = 40M / 1000 * 200

Would consider that 8M entries will be too much for a public and free RPC provider ? Even in a 50 block range that's 2M entries.

ifavo commented 1 month ago

I agree, I would not set it that high.

From my perspective the goal is not to fetch all possible events, more like all plausible for a certain application usage. For example VeBetterDAO votes or VTHO transfers. I believe there won't be 8M transfers within a 200 block range, the highest is likely 20000.

I believe the MainNet distribution of B3TR was my biggest chunk of data of a single application that I encountered to far. About 21000 clauses within 70 blocks.

libotony commented 3 weeks ago

@ifavo IMO, the limit is applied per query, it shouldn't be the end of the day for the indexers. They are free to use the iteration method in the API, by iteration I mean offset and limit option.

And I agree that indexer should be running their own node, or I would suggest indexers using the expanded block rather than filter API indexing their data.

ifavo commented 3 weeks ago

@ifavo IMO, the limit is applied per query, it shouldn't be the end of the day for the indexers. They are free to use the iteration method in the API, by iteration I mean offset and limit option.

And I agree that indexer should be running their own node, or I would suggest indexers using the expanded block rather than filter API indexing their data.

@libotony the main issue is not about the ability to handle the situation, its compatibility with existing and established evm tooling that leverages RPC connectivity on top. I believe its important to enable the use of those tooling by not setting too small limits, when slightly bigger ones could help developers to start on vechain more easier.

Thats why I believe a higher limit, for example 25k at least (50k preferred) has no negative impact but a huge positive impact on potential developers. If a public node should have an issue with 50k logs, either their hardware is insufficient or they can still set a smaller limit – my issue is not argueing that everybody should have a big limit, its about making a too small default number and therefor setting it on all public nodes in the ecosystem – because most likely nobody will set a custom one.