Investigate and Resolve 429 Errors on [POST] /api/chat/mistral

RostyslavManko commented 5 months ago

Description

We have been experiencing regular 429 errors on [POST] /api/chat/mistral, indicating that too many requests are being sent to different services. This issue may be caused by sending multiple requests for generating standalone questions, querying Pinecone, and answering the question. Our goal is to investigate the root cause of these errors and implement a solution to prevent them from occurring in the future.

Objective

Our goal is to identify the reasons behind the 429 errors on [POST] /api/chat/mistral and implement a solution to optimize the request handling and prevent these errors from happening regularly.

Actions and Considerations (ACC)

Investigate Root Cause:
- [ ] Analyze server logs and request patterns to identify the primary reasons behind the 429 errors.
- [ ] Assess the current request handling mechanism, including the number of requests sent to different services and their frequency.
Optimize Request Handling:
- [ ] Implement the solution that will solve the problem with 429 errors or minimize it.
Testing and Quality Assurance:
- [ ] Conduct thorough testing to ensure that the implemented solution effectively prevents 429 errors on [POST] /api/chat/mistral.
- [ ] Test various scenarios, including potential edge cases, to guarantee a robust and reliable solution.

Expected Outcomes

A clear understanding of the root cause behind the 429 errors on [POST] /api/chat/mistral.
An optimized request handling mechanism that prevents these errors from occurring regularly.
Improved user experience and reliability of the application, as users no longer encounter 429 errors during regular usage.
Enhanced overall performance and efficiency of the application by optimizing the number of requests sent to different services.

RostyslavManko commented 5 months ago

@fkesheh I found this warning (error) in Vercel logs: https://vercel.com/hackerai/hackergpt-2v/logs?timeline=pastHour&levels=error&page=1&startDate=1712697527866&endDate=1712701127866. If you check the warnings from the past hour, you will see around ten 429 warnings. I'm not sure if these warnings affect the output results.

fkesheh commented 5 months ago

The 429 can have two origins:

1) from the openRouter while generating the answer: (errors on standalone question will throw a 500)

  const res = await fetch(openRouterUrl, {
        method: "POST",
        headers: openRouterHeaders,
        body: JSON.stringify(requestBody)
      })

      if (!res.ok) {
        const result = await res.json()
        let errorMessage = result.error?.message || "An unknown error occurred"

        switch (res.status) {
          case 400:
            throw new APIError(`Bad Request: ${errorMessage}`, 400)
          case 401:
            throw new APIError(`Invalid Credentials: ${errorMessage}`, 401)
          case 402:
            throw new APIError(`Out of Credits: ${errorMessage}`, 402)
          case 403:
            throw new APIError(`Moderation Required: ${errorMessage}`, 403)
          case 408:
            throw new APIError(`Request Timeout: ${errorMessage}`, 408)
          case 429:
            throw new APIError(`Rate Limited: ${errorMessage}`, 429)
          case 502:
            throw new APIError(`Service Unavailable: ${errorMessage}`, 502)
          default:
            throw new APIError(`HTTP Error: ${errorMessage}`, res.status)
        }
      }

2) When the user hit the RateLimit:

export async function checkRatelimitOnApi(
  userId: string,
  model: string
): Promise<{ response: Response; result: RateLimitResult } | null> {
  const result = await ratelimit(userId, model)
  if (result.allowed) {
    return null
  }
  const premium = await isPremiumUser(userId)
  const message = getRateLimitErrorMessage(
    result.timeRemaining!,
    premium,
    model
  )
  const response = new Response(
    JSON.stringify({
      message: message,
      remaining: result.remaining,
      timeRemaining: result.timeRemaining
    }),
    {
      status: 429
    }
  )
  return { response, result }
}

I believe what we are seeing in the logs are due to the rate limit. I suggest we remove the mistral for sending 429 to the user and throw a 503 instead:

case 429:
            throw new APIError(`Rate Limited: ${errorMessage}`, 503)

In this case we can differentiate them: https://github.com/Hacker-GPT/HackerGPT-2.0/pull/201

RostyslavManko commented 5 months ago

After our investigation, we found that the 429 error did not appear because of the OpenRouter; it was because we rated the number of messages a user could send with the 429 error as limited in /api/chat/mistral endpoint .

hackerai-tech / PentestGPT