ponder-sh / ponder

A backend framework for crypto apps
https://ponder.sh
MIT License
638 stars 102 forks source link

Inconsistent Scheduling Using Block Intervals #1122

Open medariox opened 1 month ago

medariox commented 1 month ago

The current approach of using block intervals to simulate cron-like scheduling assumes that blocks are consistently produced (cit. "run at a consistent frequency") at regular intervals (e.g., every 12 seconds on Ethereum). However, as shown in the chart below from Etherscan, the average block production time is not exactly 12s. According to the Ethereum documentation, block times are only approximately 12s. In fact, the current average block time is 12.05s, as reported by YCharts.

Furthermore, this issue is not exclusive to Ethereum. Other blockchains also face similar inconsistencies in block production times, as shown by data from this Dune dashboard. This variability causes issues when scheduling tasks using block intervals.

This issue is especially noticeable when indexing longer periods, where tasks are expected to run at consistent timestamps. The current implementation leads to unreliable schedules and drift, which makes it unsuitable for use cases requiring strict time-based triggers.

These are the main problems with the current implementation:

Proposed Solution Consider an alternative mechanism to track real-world time or provide a way to trigger tasks based on wall-clock time, independent of block production intervals. One possible solution is to use the block timestamp instead of relying solely on block intervals. By referencing the block's timestamp, tasks could be scheduled based on real-world time, reducing the drift and inconsistency introduced by irregular block production.

This would allow for more consistent and reliable cron-like scheduling, especially for applications that depend on precise timing.

Thank you to the Ponder maintainers for all your hard work and for taking the time to review this issue. 🙏

0xOlias commented 1 month ago

Thanks for the thoughtful writeup!

Definitely acknowledge this limitation, and it would be great to find a suitable fix. For context, we considered using a more traditional cron feature based on timestamps, but decided on the current block based approach because it was much simpler internally, and seemed to solve 90%+ of the use cases we heard from users around this. Another consideration is that OP stack chains (which I'd estimate account for ~50% of Ponder usage rn) do not have a variable timestamp, so they don't suffer from the problems above.

If you can share a few specific uses cases where block intervals do not work but exact timestamp/cron would, that would help us make progress on this.

medariox commented 1 month ago

Thank you for the quick response!

My primary use case is to chart the last hourly price of a token. Based on the documentation, I believe my current option is to configure the scheduler like this:

blocks: {
    HourlyScheduler: {
      network: "mainnet",
      startBlock: 20421017, // Timestamp: 1722362399 (Jul 30 2024 17:59:59 GMT+0000)
      interval: (60 * 60) / 12, // Every 60 minutes
    },
  },

However, due to the issues mentioned earlier, this approach isn't ideal. For reference, here’s an example of block timestamps:

.....
Block: 20803585 | Timestamp: 1726977599 // 22 2024 03:59:59 GMT+0000
Block: 20803885 | Timestamp: 1726981223 // 22 2024 05:00:23 GMT+0000
Block: 20804185 | Timestamp: 1726984835 // 22 2024 06:00:35 GMT+0000
.....

As you can see, the blocks don’t align perfectly with the hourly schedule. But as a matter of fact, there are blocks produced at these exact timestamps:

.....
Block: 20803585 | Timestamp: 1726977599 // 22 2024 03:59:59 GMT+0000
Block: 20803883 | Timestamp: 1726981199 // 22 2024 04:59:59 GMT+0000
Block: 20804182 | Timestamp: 1726984799 // 22 2024 05:59:59 GMT+0000
.....

This could be achieved if we could configure it like this:

blocks: {
    HourlyScheduler: {
      network: "mainnet",
      startBlock: 20421017, // Timestamp: 1722362399 (Jul 30 2024 17:59:59 GMT+0000)
      timestampInterval: 3600, // Every 60 minutes
    },
  },

I’m not entirely sure which use cases the current block interval option is meant to solve, but it seems clear that it’s not well-suited for scheduling tasks at consistent real-time frequencies, even though the documentation suggests it e.g. with the example: (60 * 60) / 12 // Every 60 minutes.

I hope this clarifies my use case and why I think it’s important. Please let me know if there’s anything I can do to help speed this along. Thanks again!

ind-igo commented 1 month ago

Running into a similar issue.

My use case is for batching swaps into candlestick data. The place where the issue is most apparent is when batching 1 minute swap data. Quick analysis shows there's many gaps that appear between these.

Want to voice support for adding a time-based Cron system.

I also believe allowing offchain table support (https://github.com/ponder-sh/ponder/pull/1120) will alleviate this. Anyone can bring their own scheduled jobs system and have access to Ponders db client.