New unified approach to retrys and rate limiting

walec51 commented 5 years ago

There are two major features that have been a pain point for many users of our library and that have been handled inconsistently across our modules: API call retrys and rate limiting.

Most modules do nothing in this mater, a few have implemented their own custom solutions which leads to a vary inconsistent behavior of this library. First things first a decision needs to be made:

Should those features be in the scope of this library or nor?

One could argue that you can implement them in your application code and limit the scope of this library to just unifying exchange API's. However there is a major problem in this approach:

To solve those two problems optimally you have to do it on the level of direct API calls done by *Raw classes.

This is because of the fact that our generic API methods like for example AccountService#getAccountInfo() can do more then one call the the exchanges API to gather all the data needed to construct the desired DTO. Doing retrys around AccountService#getAccountInfo() if suboptimal because if the last underlining call to the exchange fails then you will have to repeat all of the required calls. Doing rate limit in a generic way is impossible because you do not know how many underlining calls to the exchange are done and how are they counted by the call rate limiting policy by a given exchange.

I therefore propose the following changes to be made in our code base to add support for these features in an unified way:

Use a well established and documented library: resilience4j

All the solutions for rate limiting and retrys in our modules are hacky or oversimplified. Lets not reinvent the wheel here - there are good small libraries for this in the resilience4j project. Namely: resilience4j-retry and resilience4j-ratelimiter. This project also offers metrics and circuit breakers which might be of interest to us in the future.

Add options to exchange specifications

Important: the proposed API has changes, see: https://github.com/knowm/XChange/issues/3211#issuecomment-553944293

In a way that preserves backward compatibility for most modules as most of us that use this library have some suboptimal solutions for rate limiting and retry's in our application code. Maybe someday in xchange 5.0.0 we'll enable these features by default.

ExchangeSpecification spec = new BinanceExchange().getDefaultExchangeSpecification();

if (spec.isRetrysForReadsSupported()) {
  // Enables default retry policy supplied by the modules implementer for read only API
  // calls. Throws UnsupportedOperationException if a default policy is not provided.
  spec.enableRetrysForReads();
}
if (spec.isRetrysForWritesSupported()) {
  // This will usually be unsupported because writing a good retry
  // handler for write operations is hard and most exchange APIs are badly designed with
  // no idempotentnes in mind. This means that trivial strategies would often result in
  // for eg. placing duplicated orders instead of one.
  spec.enableRetrysForWrites(); 
}

We can also allow custom retry policies:

spec.setRetrysForReads(
  RetryConfig.custom()
    .retryExceptions(InternalServerException.class, SocketTimeoutException.class)
    .maxAttempts(3)
    .waitDuration(Duration.ofMillis(100))
    .intervalFunction(IntervalFunction.ofExponentialBackoff())
    .build()
);

As for rate limiting we just add:

if (spec.isRateLimitingSupported()) {
  // Enables default rate limiting supplied by the module implementer.
  // Throws UnsupportedOperationException if its not provided.
  spec.enableRateLimiting();
}

Provide best practices for implementers

To be done in the first PR if we reach a consensus on how this feature should look like.

makarid commented 5 years ago

I favor the idea to implement a unified approach if we can turn that ON/OFF because it is not mandatory for every user. We should also have to consider the rateLimiter/retrys strategy, maybe some users will want to be much more aggressive that the others,so we need at least to have the option to implement a new strategy. I favor to use resilience4j if it hasn't a huge learning curve.

timmolter commented 5 years ago

Sounds good to me. Maybe instead of "writes" use "executions"?

walec51 commented 5 years ago

@makarid yes, having an ON/OF switch is a must

As for customizing the strategy, for read only operations I think the RetryConfig builder gives you all the customization options one can think of so just accepting a your own config in ExchangeSpecification should suffice.

Customizing retries for operations that change things (place order, withdraw) is harder because if it timeouts for example then you have to somehow determine did the operation succeed on the exchange or not before you execute an retry. In the case of placing orders you can check if an order with an unique client id was created if an exchange allows for such an id to be passed.

For ultimate flexibility we could add client decorators:

Exchange exchange = ExchangeFactory.INSTANCE.createExchange(
  BinanceExchange.class, binance -> new CustomBinanceDecorator(binance)
);

public class CustomBinanceDecorator implements Binance {

  private final Binance target;

  public CustomBinanceDecorator(Binance target) {
    this.target = target;
  }

  // ...

  @Override
  public BinanceNewOrder newOrder(/* ... */) {
    // do whatever you want
    target.newOrder(/* ... */);
    // around this, you don't even have to use resiliance4j here
  }

  // ...

}

However any one using this would have to be aware that any xchange version upgrade can brake compatibility with his decorators.

makarid commented 5 years ago

I am sorry but i don't totally understand the thing with decorator. Do you mean that the decorator will pass a new functionality for the existing implementation. For example the placeLimitOrder will be overrided by the newOrder method of your example? About the placeNewOrder i have implement a solution for timeouts or an exception.Here is my solution:

public void newOrder(LimitOrder newLimitOrder) throws InterruptedException{
    int secDelay = 0;
    do {
        try {
            secDelay++;
            openOrderId = exchange.getTradeService().placeLimitOrderRaw(newLimitOrder);

            break;
        } catch (Exception e) {
            LOGGER.error(e.getMessage(), e);
            TimeUnit.SECONDS.sleep(secDelay);
        }
    }while(true);
}

Do you mean that the newOrder method(your method) will have an implementation like this,in order to prevent any errors from the exchange?

walec51 commented 5 years ago

Sounds good to me. Maybe instead of "writes" use "executions"?

@timmolter hmm doesn't sound self explanatory to me, one can execute a read operation maybe we should use more verbose term "read only API calls" "modifying API calls"?

if (spec.isRetrysForReadOnlyCallsSupported()) {
  spec.enableRetrysForReadOnlyCalls(); 
}
if (spec.isRetrysForModifyingCalls()) {
  spec.enableRetrysForModifyingCalls(); 
}

walec51 commented 5 years ago

@makarid

are you familiar with the decorator design pattern?

In my post I proposed to decorate client interfaces that represent direct API calls to the underlining exchange. TradeService#placeLimitOrder is NOT that, its a method from out generic interfaces.

makarid commented 5 years ago

Yes, my bad, i was talking about the Raw implementation. I have corrected now. If you use the decorator approach, do we need to implement the new functionality on every rawCall or we will implement some standard decorators and then put the one we want to rawCall?

walec51 commented 5 years ago

your example is still wrong as newOrder and placeLimitOrderRaw are different methods - this is not the decorator patter you need a common interface to implement a decorator pattern, a *Raw class is a class

makarid commented 5 years ago

OK then forget the example. As i saw on the link about the Decorator pattern, we need to create a new Override method for every *Raw.class method.Is that right?

walec51 commented 5 years ago

my decorator proposal has nothing do to with *Raw classes, I proposed to decorate proxy clients like

https://github.com/knowm/XChange/blob/develop/xchange-binance/src/main/java/org/knowm/xchange/binance/Binance.java

Raw classes invoke methods of those clients, decorating proxy clients does not impose any changes to Raw classes

makarid commented 5 years ago

I understand it now. Thanks a lot for clarifying it. I find it very good idea

walec51 commented 5 years ago

in my example, your decorator for each method in the Binance interface must implement at least a pass-through like:

@Override
public SomeReturn someMethod(SomeParam param) {
  return target.someMethod(param);
}

or you should use Java Proxys to if you don't want to

These are common implementation practices for the decorator pattern in Java.

declan94 commented 5 years ago

A unified approach for API rate limiting sounds very good. However, in the above approach, we cannot use the information returned by the API for rate limiting, which is much more precise and reliable in some case.

For example, Bitmex gives the "x-ratelimit-limit", "x-ratelimit-remaining", and "x-ratelimit-reset" in the response header. It is very clear and accurate to implement API rate limiting based on this information. While it may cause some problems if the limitation is implemented locally only.

walec51 commented 5 years ago

@declan94 thank for your input, its true that the direction I'm exploring in https://github.com/knowm/XChange/pull/3231 does not cover limiting based on information in response headers.

I'll take it into consideration. It think such a case like in Bitmex would be better handled by an custom Interceptor for all methods which updated the state of the RateLimiter after each request.

walec51 commented 5 years ago

@declan94 I've been thinking about those Bitmex headers and I'm starting to think interpreting them is not trivial and might lead to improper rate limiting in multi threaded applications. The current implementation is definitely not thread safe nor does it take network latency into account.

Here is a example of a scenario where a response to an earlier request may override rate limit information supplied from a response from a later request:

crlu

I think that implementing a rate limiter that just allows 60 req per minute per account (settings can be overwritten if you have an individual limit) and ignoring those response headers is a much simpler and less error prone approach. The only downside is at most a few % of throughput.

walec51 commented 5 years ago

After doing some initial implementation I came to a different API then I initially proposed:

Add options to exchange specifications

This way you can make sure for the module to use retries or rate limiting if it was enhanced with resilience4j functionality

ExchangeSpecification spec = new BinanceExchange().getDefaultExchangeSpecification();
spec.setRetryEnabled(true);
spec.setRateLimiterEnabled(true);

In Xchange 4.x.x if a module already had some retry / rate limit strategy before it was rewritten to resilience4j then this will be set to true. If not then it will be false;

In Xchange 5.x.x I think this should be set to true by default in all modules.

Annotations on client interfaces

This is how we would like to apply retry and rate limiting functions - by just annotating client interfaces.

@Path("")
@Produces(MediaType.APPLICATION_JSON)
public interface BinanceAuthenticated extends Binance {

  public static final String SIGNATURE = "signature";
  static final String X_MBX_APIKEY = "X-MBX-APIKEY";

  @POST
  @Path("api/v3/order")
  @Decorator(
      retry =
          @Retry(
              name = "newOrder",
              baseConfig = ResilienceRegistries.NON_IDEMPOTENTE_CALLS_RETRY_CONFIG_NAME))
  @Decorator(rateLimiter = @RateLimiter(name = ORDERS_PER_SECOND_RATE_LIMITER))
  @Decorator(rateLimiter = @RateLimiter(name = ORDERS_PER_DAY_RATE_LIMITER))
  @Decorator(rateLimiter = @RateLimiter(name = REQUEST_WEIGHT_RATE_LIMITER))
  BinanceNewOrder newOrder(
      @FormParam("symbol") String symbol,
      @FormParam("side") OrderSide side,
      @FormParam("type") OrderType type,
      @FormParam("timeInForce") TimeInForce timeInForce,
      @FormParam("quantity") BigDecimal quantity,
      @FormParam("price") BigDecimal price,
      @FormParam("newClientOrderId") String newClientOrderId,
      @FormParam("stopPrice") BigDecimal stopPrice,
      @FormParam("icebergQty") BigDecimal icebergQty,
      @FormParam("recvWindow") Long recvWindow,
      @FormParam("timestamp") SynchronizedValueFactory<Long> timestamp,
      @HeaderParam(X_MBX_APIKEY) String apiKey,
      @QueryParam(SIGNATURE) ParamsDigest signature)
      throws IOException, BinanceException;

  @GET
  @Path("api/v3/openOrders")
  @Decorator(retry = @Retry(name = "openOrders"))
  @Decorator(rateLimiter = @RateLimiter(name = REQUEST_WEIGHT_RATE_LIMITER, weight = 5))
  List<BinanceOrder> openOrders(
      @QueryParam("symbol") String symbol,
      @QueryParam("recvWindow") Long recvWindow,
      @QueryParam("timestamp") SynchronizedValueFactory<Long> timestamp,
      @HeaderParam(X_MBX_APIKEY) String apiKey,
      @QueryParam(SIGNATURE) ParamsDigest signature)
      throws IOException, BinanceException;

  // ...
}

Customising retry behaviour

We can customize retry strategies by using resilience4j repository mechanism. You can override an retry configuration for a given call - lets say for the above openOrders- this way:

exchange.getResilienceRegistries().retries().retry.replace(
  "openOrders",
  RetryConfig.custom()
    .retryExceptions(InternalServerException.class, SocketTimeoutException.class)
    .maxAttempts(3)
    .waitDuration(Duration.ofMillis(100))
    .intervalFunction(IntervalFunction.ofExponentialBackoff())
    .build()
);

walec51 commented 4 years ago

updated my PR

resiliency features are now configured in *Raw classes - previously I did it using annotations on proxy interfaces but my new annotation scheme was not accepted by the resilience4j project so I removed this way of doing things

documentation how this would be implemented in modules in its current share: https://github.com/knowm/XChange/wiki/Implementing-resiliency-features

nibarulabs commented 3 years ago

Hello, I'm just wondering is there is any path forward in this? My project needs a more robust rate limiting approach than what I have and I'm looking for a more proper solution.

Is there an eta on when this would make it into a release? Thanks.

earce commented 3 years ago

@nibarulabs, @walec51 has added these rate limiting features into Binance. There isn't any immediate plan to add this feature set into all exchanges so if you are interested in seeing this for a specific exchange the best way is to submit a PR with those changes.

nibarulabs commented 3 years ago

@earce Thanks for the swift reply. My project is looking more and more like it's gonna need this on exchanges other than Binance (specifically Coinbase Pro), so I will play around with a fork today and if it goes well, I will submit a PR. Thanks again for all the work you guys have done. 👏

(@walec51 thank you too for the great work!)

earce commented 3 years ago

@nibarulabs CoinbasePro has some baked in rate limiting but it will probably be better to move over to this resilience framework.

@walec51 has been the one really pushing this forward :) but I have started the work on applying this to the streaming framework.

nibarulabs commented 3 years ago

@earce I just looked through the CB Pro code and nothing jumped out at me for the baked in rate limiting. Is there a specific class/spot that you know of that handles this? (If you don't have the time, np, I'll keep looking)

earce commented 3 years ago

https://github.com/knowm/XChange/blob/develop/xchange-coinbasepro/src/main/java/org/knowm/xchange/coinbasepro/service/CoinbaseProMarketDataServiceRaw.java#L99, I do not recommend replicating this. There are a few issues with this even though it mostly works.

We eventually want to port this to the new framework.

nibarulabs commented 3 years ago

Got it - I wasn't looking on the development branch. Thank you. 👍

walec51 commented 3 years ago

@nibarulabs this document describes the new director we are taking for rate limiting and other resiliency features:

https://github.com/knowm/XChange/wiki/Implementing-resiliency-features

nibarulabs commented 3 years ago

@walec51 Yah, I saw that earlier and am following that, thx.

For CoinbasePro there is also an endpoint

accounts/<account_id>/transfers

that looks like it has been removed from their api. Possibly from a while ago - I did not see anything regarding that in their changelog. Is it ok to just remove it? (No other choice, really.)

earce commented 3 years ago

Yea that particular endpoint doesn't seem to be active, I would try and query it and if it doesn't work you can remove it.

Also feel free to join the discord, it's easier to chat on there.

earce commented 3 years ago

@walec51 question for you, for the Bitmex style rate limiting, how do you mean create a custom interceptor?

walec51 commented 3 years ago

@earce for what do you need a custom interceptor?

what do you need that cannot be achieved with plain resilience4j decorators?

walec51 commented 3 years ago

@earce do you want to drain the rate limiter based on a certain response / exception?

if so then check these examples:

https://github.com/resilience4j/resilience4j/pull/1264/files#diff-966205bc5b6df4ec5ea80bac91d86da350415587ea5e6291b62030fa8a6fc9f3R13

earce commented 3 years ago

@walec51 the bitmex example in this case, sends back some header that tell us how long we need to wait before we retry. Two edge cases can present themselves here:

The rate limiter you are using say has an interval of 1 minute and the Bitmex API tells us to backoff for 2, we will wait the one minute start firing requests and get banned even longer, potentially getting ourselves blocked out.

The second scenario is an app restart and then the same situation where we are not blocking requests for as long the API tells us and instead for whatever the duration of the rate limiter is.

walec51 commented 3 years ago

remember that xchange can be used in an multi threaded app - because of this you should rate limit proactively event before bitmex says you did to much calls

so configure rate limiters as bitmex suggests:

https://www.bitmex.com/app/restAPI#Limits

60 requests per minute on all routes (reduced to 30 when unauthenticated)
10 requests per second on certain routes (see below)

if despite of this you receive an 429 response then just drain the rate limiter using this option

        RateLimiter limiter = RateLimiter.of("someLimiter", RateLimiterConfig.custom()
            // ...
            .drainPermissionsOnResult(
                callsResult -> isFailedAndThrown(callsResult, BitmexException.class, e -> e. getHttpStatusCode() ==429))
            .build());

earce commented 3 years ago

This doesn't really cover all scenarios, while proactive is a must, reactive is important to have implemented.

Process restarts won't be protected with this rate limiter and bans that are longer then whatever your cycle is programmed won't be covered either. I want to see if there is a good way to dynamically adjust the drain period.

walec51 commented 3 years ago

If the process restarts and causes a 429 error then it will drain the cycle. You will not get banned for this and afterwords you should not see any 429 errors.

Interpreting Birmex headers in a multi threaded environment is not an easy task - HTTP requests my have variable durations and a response to request A my come after a response to a request B. Besides having complex code to maintain you will not get much out of it in my opinion.

earce commented 3 years ago

I don't entirely agree with this.

I would imagine if the process restarts, after a 429 is in place, you will get further penalized (potentially). Because you would

1) Have ban in-place (say 1m) 2) Submit another request 3) Potentially increase ban time (now > 1m) 4) Your cycle will end after 1m (ban will still be in place) 5) You submit another request, penalizing you even more potentially

You may end up with a permanent or much longer ban this way.

I don't think interpreting bitmex headers would be any more then inspecting the throwable you give when you call drainPermissionsOnResult you could just inspect your object there and pull them out.

walec51 commented 3 years ago

The scenario you mentioned would only happen if both the below conditions ware meet:

ban duration > cycle duration
every single request during the ban extends the ban for a duration larger then the cycle duration

I don't think that's the case as BitMex has a 60 s cycle and a 30 s ban for exceeding it.

If BitMex gives you a long - couple of hour ban or more - then I don't think this is something a library like XChange should handle. This requires some human investigation.

As for a more persistence implementation - as I said:

HTTP requests my have variable durations and a response to request A my come after a response to a request B.

So just applying the data from the last response (which may not be from the last request but an earlier lagging one) will not be adequate in a multi threaded environment and prone to race conditions.

There is an old unix saying: worse is better

if it results in a much simpler system. I think we should try my simpler approach and go with a more complex implementation only if it does not perform well in production.

walec51 commented 3 years ago

But if you want to do a more complex solution then show some pseudo code that solves this problem:

https://github.com/knowm/XChange/issues/3211#issuecomment-538794173

earce commented 3 years ago

I only say this because it does https://www.bitmex.com/app/restAPI#Request-Rate-Limits If a large number of 4xx or 5xx responses are delivered in a short period of time, your IP address may be banned for an hour. Multiple bans in a short time will result in a weeklong ban. This is effectively like an exponential backoff on their end.

I know from using this personally that other exchanges do this type of thing as well (an exponential backoff ban). The reason they include such info in their headers is this exactly, to tell you the user how long to backoff for.

I understand your point with the multi threaded race conditions but having nothing in place at all makes it so there is no protection.

Personally I think if an API gives you details about a rate limit ban a library like XChange which is now handling rate limiting should also have this, is an exchange specific modification of behavior that is already part of XChange.

As an example t1 and t2:

If t1 gets a ban for 2 minutes from now, I would drain permissions for now + 2m.

If during that time from when: t1 submitted a request, received a ban and drained permissions, t2 is able to submit a subsequent request which receives a ban for 3m then we would update the drain period to now + 3m.

If alternatively t2 was actually submit before t1 and received an older ban for now + 1m we would compare when drain periods end before updating, always taking the further out drain period. This way we would not be subject to anything like network latencies which would risk having the latest ban in place due to a slow request.

Before an thread would be allowed to submit any more requests it would always do an additional check to make sure there are available permissions and that nothing was changed under its feet.

I understand this grows the app complexity a bit (more so resilience logic itself) but I think it offers more protection.

walec51 commented 3 years ago

@earce come to think of it this type of functionality is more of an circuit breaker then a rate limiter

for such long time scales like a 30 sec or 2 hour ban there is no point of sleepeing on a thread - we should throw an exception indicating that the circuit is broken without calling the exchanges API

currently resillience4j support circuit breaking for a fixed amount of time if a certain error (or number of errors) happen in a set amount of time

we would have to extend resiliance4j functionality so that the circuit breaker could determine how long should it remain broken dynamically based on the responses / exceptions it receives

I'll submit a feature proposal on the weekend

earce commented 3 years ago

@walec51 this sounds great, appreciate you taking the time to reconsider/revisit this, I think what you are describing does sound like the correct implementation of this protection

earce commented 3 years ago

@walec51 have you had any chance to look into this?

walec51 commented 3 years ago

@earce sorry for the delay, I've posted a feature request to resilience4j: https://github.com/resilience4j/resilience4j/issues/1387

knowm / XChange

New unified approach to retrys and rate limiting #3211