tagomoris / presto-client-node

Distributed query engine Presto client library for node.js
MIT License
126 stars 57 forks source link

Retry for 502-504 codes, fail on >= 400 codes #74

Closed MasterOdin closed 12 months ago

MasterOdin commented 1 year ago

Closes #72

PR modifies the behavior of the client such that:

and that this happens consistently whether the codes happen on the first request or a subsequent request. Previously, all of these codes would throw an error only on the first request, while all subsequent requests would trigger an uncaught exception.

I included tests that validate the proper behavior for triggering these codes on the first POST call to /v1/statement as well as the subsequent GET call to /v1/statement/....

tagomoris commented 1 year ago

I have some requests for the change:

MasterOdin commented 1 year ago

To address the above comments, namely (1), I ended up having to rewrite some amount of statementResource method so that we could have a single function that wraps making requests so that I could put the retry logic in one place as any request could fail. I think I've retained backwards compatibility in terms of error message format on first request vs 2+ request, as well as only allowing cancelling on the 2+ request.

a new parameter timeout to stop retries and call error callback, with a default value (1 minutes or so), to not make queries which never stops

In terms of backwards compatibility, I think this shouldn't have a default?

update test case names to describe what is tested (what should be verified), instead of what the test code does

I renamed the test, but if you'd like to see a different name, I'd prefer you give me more exact naming instead of guessing. Here's the current test output:

  ✓ cannot use basic and custom auth (12 ms)
  presto
    ✓ simple query (3219 ms)
    ✓ query with error (1675 ms)
  trino
    ✓ simple query (3293 ms)
    ✓ query with error (1645 ms)
  when server returns non-200 response
    the client should retry for 50x code
      the client retries for 502 code
        ✓ the client returns success (992 ms)
      the client retries for 503 code
        ✓ the client returns success (962 ms)
      the client retries for 504 code
        ✓ the client returns success (1001 ms)
    the client fails for 404 code
      the client fails after 0 requests
        ✓ the client returns error (4 ms)
      the client fails after 1 requests
        ✓ the client returns error (817 ms)
    the client fails for 500 code
      the client fails after 0 requests
        ✓ the client returns error (12 ms)
      the client fails after 1 requests
        ✓ the client returns error (824 ms)
  when timeout is set
    ✓ the query times out (1009 ms)
tagomoris commented 1 year ago

In terms of backwards compatibility, I think this shouldn't have a default?

From a compatibility viewpoint, this change is incompatible because it retries requests for 50x errors. If we want to keep the behavior about errors, we should introduce a parameter like retry_temporary_errors with default false to not retry. (I don't want to do so.)

This change introduces a kind of incompatibility, but we should keep the behavior consistent. In previous versions, users' queries will stop when they meet errors (after a fixed amount of time). With error retries without default valued timeout, queries will never stop forever if a query causes 50x error always. That's an inconsistent behavior users don't expect.

So, when we introduce retries, we should ensure that queries will finish after a period of time anyway. Does it explain the reason enough?

MasterOdin commented 1 year ago

Fair enough @tagomoris. Per your feedback, I've made the timeout default to 60 seconds, with the ability to disable it by setting the field to null or 0. The timeout can be set both at the client level as well as the query level.

Nurymka commented 1 year ago

any updates on this PR? would be really handy if we can get this merged

tagomoris commented 1 year ago

Ah, I missed the updates in Aug. I'll check it in this week.

tagomoris commented 1 year ago

The change looks good to me except for query_id.

tagomoris commented 12 months ago

I want this to be rebased on the change of #78...

tagomoris commented 12 months ago

@MasterOdin Could you rebase on the current master?

MasterOdin commented 12 months ago

@tagomoris rebased on the latest master. I also condensed the commits down to once since I'm already doing history rewriting on the branch.

tagomoris commented 12 months ago

Merged. Thank you!