masa-finance / masa-oracle

Masa Oracle: Decentralized Data Protocol 🌐
https://developers.masa.ai/docs/masa-protocol/welcome
MIT License
24 stars 19 forks source link

bug: Explore 504 Error - Blocking Response Channel #611

Closed theMultitude closed 2 weeks ago

theMultitude commented 3 weeks ago

Problem Statement

While making requests to the protocol a common and time consuming issue is coming from a 504 containing something like the following information:

ERRO[0873] response channel is blocking for request ID: f3a7f608-dcc5-4f9b-930f-615d7f0ec054
[GIN] 2024/10/25 - 12:06:54 | 504 |         2m26s |             ::1 | POST     "/api/v1/data/twitter/tweets/recent"

This is causing the SDK to hand while waiting for this interaction to complete.

Acceptance Criteria:

mudler commented 3 weeks ago

Seems this is hitting quite frequently, but not blocking entirely functionalities

mudler commented 2 weeks ago

@restevens402 to provide updates on the state on the card.

TL;DR: the system is working as designed, but we are hitting real limits. We have a follow-up to improve the protocol in #507 to ensure that the network is more resilient and optimized while dispatching jobs.

restevens402 commented 2 weeks ago

I have spent some time around how the 504 error occurs and what we can do about it. It turns out, that there isn't much we can do with the current functionality. It comes down to only having poor-performing nodes that are available to do work. The system is working as designed. The logged message response channel is blocking for request ID: ... is happening because the DistributeWork method is hitting the context timeout. Here are the details: Workflow Leading to a 504 Error

  1. Request Handling:
    • A client sends a request to the API, which is then processed and delegated to a worker for execution.
  2. Worker Task Execution:
    • The API sends a work request to a worker and waits for a response on a channel.
  3. Timeout Detection:
    • The API uses a select statement with a time.After case to detect if the worker takes too long to respond.
    • If the timeout period elapses without receiving a response, the time.After case is triggered.
  4. Invoking handleTimeout:
    • When the timeout is detected, the handleTimeout function is called.
    • This function sends a 504 Gateway Timeout response to the client, indicating that the request could not be completed in time. In summary, there isn't anything wrong with the code, it comes down to the performance of the network. We can change the 504 response. The failed work still occurs the same.