knowm / XChange

XChange is a Java library providing a streamlined API for interacting with 60+ Bitcoin and Altcoin exchanges providing a consistent interface for trading and accessing market data.
http://knowm.org/open-source/xchange/
MIT License
3.86k stars 1.94k forks source link

Kraken requires SNI #3733

Open mumch opened 4 years ago

mumch commented 4 years ago

Kraken Streaming connection returns http error "429 Too Many Requests" within one hour almost once per day when connecting to some (we currently connecto to 60) currencypairs. We investigated this issue with Kraken support and eventually the informed us, they expect it's due to SNI is required:

<< We suspect you are receiving the 429 Too Many Requests error because the WebSocket upgrade is failing and subsequent attempts are reaching the maximum request limit (which varies depending upon the WebSocket library being used).

The cause of the failing WebSocket upgrade is most likely Cloudflare's recent requirement of TLS Server Name Indication (SNI). Many HTTP and/or WebSocket libraries (such as the Twisted Python module) do not support SNI by default, hence the 429 error (or sometimes a 403 error) will be returned unless the required modifications are made to enable SNI.

As an example, our own Python WebSocket library started experiencing the same issue, but after our developers modified the following code to enable SNI, the issue was resolved:

factory = self.factories[id_] options = ssl.optionsForClientTLS(hostname=hostname) # Enable TLS SNI self.conns[id] = connectWS(factory, options) I hope that this helps. Please let us know if this does not resolve your issue, and we would be happy to assist you further. Have a great day! Regards, Chad G. Kraken Client Engagement Team

_

earce commented 4 years ago

Some comments:

Based on this email I would expect the connection to never succeed successfully.

At a quick glance the websocket connection should only be made once upon disconnect. All 60+ pairs getting subscribed to will be sent over a individual subscription messages so my initial suspicion is that this error is happening not upon connection but attempted subscription of so many pairs.

These individual messages exceed the rate limits Kraken has in place, I suspect if you have less pairs this won't be an issue. I have a review out https://github.com/knowm/XChange/pull/3721/files#diff-e8bac863c50b533110899e368cf7701a which aims to throttle when a 429 is returned by the server over websocket message.

You will also experience this probably semi frequently given that every time you have a websocket disconnect you will try and subscribe 60 currency pairs without any throttling.

mumch commented 4 years ago

Dear Earce, Thank you for your effort regarding this issue. This case has an additional history: First we made one single connection what ended in a "Exceeded msg rate" issue. Kraken recommended us to use more connections:

The solution to this error is to limit your new connection/subscription attempts to no more than 20 per second, and to spread your subscriptions across multiple separate connections (but without exceeding the maximum connection limit of 500 connections).

and: Spreading your subscriptions across ~60 connections should be fine. Just make sure to create the connections over a few seconds, so as to not exceed the 20 new connections per second limit.

That's why we make now 60 connections (one connection for each subscription). I cannot rate, whether the recommended solution of using SNI will really solve the issue. But Kraken think so and as far as I have looked into the sourcecode, XChange currently doesnt't use SNI, although Kraken says it's a requirement.

earce commented 4 years ago

So are you running all of your connections on a host with a single external IP? Because that is how these exchanges typically rate limit your connections.

If you have a single host and run 3 processes with 3 connections but 20 subscriptions each then your message rate across your host is still 60 messages. It's easy to rate limit upon startup but upon reconnecting the library will attempt to reschedule connections immediately. In the same sense if you have 60 connections and they all disconnect and attempt to reconnect simultaneously you will have rate limiting issues.

How is your current deployment setup?

mumch commented 4 years ago

Deployment setup: Yes, all in a single application on one host with one WAN IP.

What you suspect - all re-connections occure at the same time - is exactly what I was expecting. But Kraken is pretty sure, it's due to missing SNI, after they investigated the issue at their side. For me, the re-connecting at the same time seems still be the most plausible issue. (While I still think SNI should be added, since it's a requirement what may led to connection issues.)

Do you have an idea, how to prevent re-connecting all at the same time and adding a delay between re-connections?

earce commented 4 years ago

The only thing that still doesn't make sense is that if SNI was required by Cloudflare it would simply never work. It's one thing if you attempt to connect and sometimes succeed. A server requiring SNI is not going to let you sometimes connect and sometimes fail? Does that make sense?

My understanding is https://github.com/knowm/XChange/blob/develop/xchange-stream-service-netty/src/main/java/info/bitrich/xchangestream/service/netty/NettyStreamingService.java#L206 gets you SNI but maybe @badgerwithagun or @mdvx can chime in?

However as a counter example this same implementation is used by Coinbase. As of Feb of this year Coinbase required SNI https://status.pro.coinbase.com/incidents/vzyhh3cgs7pl and there are no issues connecting to Coinbase.

As far as your question, the fact that you have so much running out of a single instance is going to have you run into problems. If you have broad disconnects your reconnects are probably going to happen simultaneously and since its rate limited on a single IP you will run into these problems. Like I said I have a review for rate limiting when we detect 429s but in the meantime what I would recommend is trying to separate this into groups of maybe 5 currency pairs and put each process on a different host. Think AWS small fargate instances.

I still have to test what is the message limit on an open websocket to have a better idea of the limitations. I run a single websocket connection on a single host with 6 currency pairs without issues.

mumch commented 4 years ago

Thank you for your support.

Yes, I think the same way: If SNI is required, it's always required or an initial connection establishement is not even possible. Although the Kraken support thinks different (maybe we understand it wrong, Kraken support has missing knowledge, Kraken just answered anything to be able to close the support case... I don't know). I am not familiar enough with netty/SNI, so let's see what badgerwithagun or mdvx will say.

We don't like to fragment the application into different instances in order to keep it simple and maintainable. But of course it's a possible workaround for now..

Kraken mentioned 20 messages / second as rate limit over one connection, while the connections are limited to 500 and the amount of new connections per time is limited to 20 connections / second. We think it's possible that a single subscription may already result in 20 messages per second. That's why we have not yet reprogrammed it to make more than one subscription per connection. (According to kraken it doesn't matter whether the messages are sent or received to reach the exceeded msg rate limit.) Furthermore, I think the messages amount per time is highly dependent on what marketdata you subscribe to and how much messages are generated, e.g. BTC/USD will generate much more trade messages than LTC/GBP, while an orderbook subscription may generate more messages than a trades subscription. BTW as a reference: We connect to Bitfinex and subscribe to 36 trades, Coinbase pro 33 trades, Binance 46 trades etc: All by a single connection to each Exchange. None of them shows such problems like Kraken does.

earce commented 4 years ago

I am surprised that on an open connection they are saying 20msg/s limit on the reading, I currently subscribe to 6 currency pairs on single connection and have not seen any issues. 20msg/s is a really small limit I don't know how this can be the case. It's different if they said they had a 20 subscriptions a sec on a connection, that would make sense.

I just ran some tests on the orderbook subscription for Kraken and I am seeing 30+ msg/s with frequency so I don't think that 20msg/s is accurate. I would say try and split up your application to 5 pairs per connection and make sure each has a separate external facing IP until we have better throttling in the code.

mumch commented 4 years ago

Thanks, we will implement it and I will let you know, as soon as we got some interesting experiences/numbers regarding this topic.

earce commented 3 years ago

@mumch any updates with this?

mumch commented 3 years ago

Hi earce, we see still '429 Too Many Requests' in the log while having 10 connections open to kraken. Hence it seems like already 10 connections per second is too much. (Or the limit of connections is measured in the sub-seconds range.)

earce commented 3 years ago

If you try say 5 pairs? We are currently working on a rate limiting implementation so hoping that fixes the issues in the coming weeks.

mumch commented 3 years ago

Yes, approximatel: We've used a separate connection for each 6 pairs.

But in between we saw the same issue under the following condition: Subscribed to a total of 34 pairs split to a total of 7 connections (5 orderbook subscriptions per connection).