starkware-libs / starknet-specs

94 stars 76 forks source link

Add `log_index` field to events emitted by transactions #157

Open zomglings opened 1 year ago

zomglings commented 1 year ago

Currently, there is no canonical way to order events in Starknet logs.

This is challenging when analyzing starknet data - any total ordering you impose as an analyst relies on the assumption that all nodes you could connect to emit events in the same order, and that this order is preserved by all RPC middleware.

It would be better if the log_index was canonically reported by all nodes as part of the starknet_getEvents response.

Mirko-von-Leipzig commented 1 year ago

My only argument against this, is that I think this will be quite costly to implement on the node end. Event filtering and storage is already fairly complex.

zomglings commented 1 year ago

There was some discussion in Telegram about why this is desirable, so I wanted to summarize the discussion and add some clarification here.

The fact that starknet_getEvents supports pagination (unlike eth_getEvents) means that each node is implicitly making a guarantee about the ordering of events. So one might think:

Is it not enough to just specify that events are always in order? That would have to be the case for pagination in any case

The problem right now is that the guarantee is being made on a per-client basis. The way that one Starknet client implementation implements ordering is not necessarily the way in which another client implements ordering. a log_index would be a guarantee of ordering at the protocol level, not at the level of each node implementation.

It is also important to understand why this is important. A single transaction may fire the same event (with the exact same parameters) multiple times. This means that the event type and parameters do not uniquely specify the event. Any event crawler must have a strategy to recover from failures mid-crawl. One scenario that crawlers must deal with is that they connect to a different RPC node post-crash than the one they were connected to pre-crash -- an RPC node implementing a different Starknet client. Unless there exists a protocol-level guarantee (like log_index) about the ordering of events, the only strategy for a crawler to recover from a crash is to delete the last block worth of events from crawl storage and start there, duplicating some of their work.

This certainly isn't the highest priority issue to solve, and most event crawlers build custom workarounds to this issue (e.g. Starkscan crawlers suffix _<event_index> to the transaction hash when displaying events), but I do believe it is important to address at scale.

To @Mirko-von-Leipzig 's point about how costly it is to implement - if it is very costly to implement, that doesn't mean that Starknet should not specify such information at protocol level. It should just mean that this is a lower priority relative to other development.

zomglings commented 1 year ago

The real tradeoff I see here is that implementing log_index or something like it will increase one or both of the following:

  1. the amount of storage per event required to operate a node
  2. the starket_getEvents response time

It might be that such an increase is not acceptable.

omerfirmak commented 1 year ago

Adding a log_index to FILTERED_EVENT that corresponds to the index of the event within a block is essentially free in Juno.

But the situation might be different for other clients depending on their DB schemas.

Mirko-von-Leipzig commented 1 year ago

Adding a log_index to FILTERED_EVENT that corresponds to the index of the event within a block is essentially free in Juno.

I don't think its enough to know the index within a block -- it would need to be the global index? Otherwise the same problem of ordering remains.

I feel like it should be enough to specify in the method description that the events will be returned in chronological order? i.e. block, transaction, event-in-transaction order?

omerfirmak commented 1 year ago

I don't think its enough to know the index within a block -- it would need to be the global index? Otherwise the same problem of ordering remains.

FILTERED_EVENT already has the block number IIRC, so block number and index within that block, when used together, should allow ordering events globally. Am I missing something?

Mirko-von-Leipzig commented 1 year ago

I don't think its enough to know the index within a block -- it would need to be the global index? Otherwise the same problem of ordering remains.

FILTERED_EVENT already has the block number IIRC, so block number and index within that block, when used together, should allow ordering events globally. Am I missing something?

Ah no my bad; forgot it already has the block number.

A further consideration though: its pretty much impossible for events to be emitted out-of-order since this would break entirely for latest and pending pagination when updates to those come in during pagination.