hashgraph / hedera-mirror-node

Hedera Mirror Node archives data from consensus nodes and serves it via an API
Apache License 2.0
147 stars 111 forks source link

REST API doesn't always return all logs for the latest timestamp #6037

Open iron4548 opened 1 year ago

iron4548 commented 1 year ago

Description

I poll for latest smart contract event logs from mirror node via REST API call via this URL

/api/v1/contracts/results/logs?timestamp=gt:${unixTimestampFrom}&topic0=${topic_id1}&topic0=${topic_id2}

Where topic_id1, topic_id2 are smart contract log topics that I'm interested in listening for.

Whenever I get new logs, I store the latest timestamp (as a string so that I don't lose precision) and use it for unixTimestampFrom for next time to ensure that I don't get duplicate logs.

During testing I found that sometimes it doesn't return ALL logs for the latest unix timestamp at that time. For example I had multiple servers polling the logs at the same time and then store them locally, and then do a compare between the logs for both servers and there are gaps.

For example, let's assume that each smart contract call should ALWAYS produces 3 event emits logs, and all three event logs should have exactly same unix timestamp.

What I found was that it sometimes returns 2 or 1 instead of 3.

Easier to reproduce if calling /api/v1/contracts/results/logs at a higher rate (stress testing)

Is it a race condition? Where mirror node have haven't finished storing all the latest event logs in the database? I have a logic in place to deal with this issue and it somewhat works (treat latest logs as 'incomplete' until next call or two) which isn't ideal.

Steps to reproduce

(Theory)

Create a loop that call a somewhat complex smart contract that produces multiple event logs. The smart contracts I've been getting logs from typically costs 200_000 to 360_000 in gas to give you an idea of how long the processing take. Let's assume that it has 5 event logs.

  1. Poll the mirror node via REST API to get the latest log /api/v1/contracts/results/logs?timestamp=gt:${unixTimestampFrom}&topic0=${topicId}
  2. Store latest timestamp for unixTimestampFrom for next call
  3. Repeat at a high call rate
  4. After getting logs, check that you always have 5 event logs for each timestamp (grouped). If not, then you have reproduced the issue.

Additional context

No response

Hedera network

mainnet

Version

N/A

Operating system

Linux

iron4548 commented 1 year ago

Here's an example.

I had two instance of the client polling for latest smart contract events and logging them.

Then I compared the logs and one had some missing events (5 missing out of 10 expected events for the specific timestamp). i.e. the the right client had 10 events, and the left client only had 5 events.

image