slazarov / python-bittrex-websocket

Python websocket for Bittrex (non async).
http://python-bittrex-websocket-docs.readthedocs.io/en/latest/index.html
MIT License
103 stars 38 forks source link

huge memory leak right after reconnect #61

Open gzivdo opened 6 years ago

gzivdo commented 6 years ago

The code just like in example only with subscribe_to_exchange_deltas

Output

2018-05-06 18:55:35 - bittrex_websocket.websocket_client - ERROR - Connection is already closed..
2018-05-06 18:55:35 - bittrex_websocket.websocket_client - ERROR - Initiating reconnection procedure
2018-05-06 18:55:35 - bittrex_websocket.websocket_client - INFO - Establishing connection to Bittrex through https://beta.bittrex.com/signalr.
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [QueryExchangeState] for [BTC-ZCL].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [SubscribeToExchangeDeltas] for [BTC-ZCL].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [QueryExchangeState] for [BTC-RDD].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [SubscribeToExchangeDeltas] for [BTC-RDD].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [QueryExchangeState] for [BTC-TRX].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [SubscribeToExchangeDeltas] for [BTC-TRX].
2018-05-06 18:56:20.775091 15593
2018-05-06 18:57:20.835102 15659
2018-05-06 18:58:21.108694 15691
2018-05-06 18:59:30.381624 15736
2018-05-06 19:00:06 - bittrex_websocket.websocket_client - ERROR - Undocumented error message received with code -1 and payload: 32. Report to https://github.com/slazarov/python-bittrex-websocket
2018-05-06 19:00:06 - bittrex_websocket.websocket_client - ERROR - Initiating reconnection procedure
2018-05-06 19:00:06 - bittrex_websocket.websocket_client - INFO - Establishing connection to Bittrex through https://beta.bittrex.com/signalr.

After first reconnect app starts to eat all available memory and swap.

slazarov commented 6 years ago

Error 32 should refer to socket.error: [Errno 32] Broken pipe and is due to connection drop. If your connection is working fine, then I suspect that the error is coming from Bittrex's side since recently they have been pushing constant updates in order to get the main endpoint stable. Note that we are using the beta endpoint currently in order to avoid bypassing cloudflare. You can try:

ws = MySocket(url='https://socket.bittrex.com/signalr')

Just make sure you have installed cfscrape and node.js.

The reconnection memory leak is due to the fact that the client tries to reconnect and since it's denied immediately it goes into an infinite loop which causes high CPU usage. This is on my side and I will implement a fix. I am thinking of maximum connection retries solution.

gzivdo commented 6 years ago

cpu usage not high at all, but memory usage rising quite fast.

slazarov commented 6 years ago

Doesn't change the fact.

Also when posting make sure you specify what Python version you are using and also provide the code you are executing. In this case you said you are using the example but you are also invoking QueryExchangeState and I have no idea whether you are executing it in a loop.

gzivdo commented 6 years ago

Python 2.7.x

from bittrex_websocket.websocket_client import BittrexSocket
import cPickle, time, datetime

tickers = {
    'ZCL':  (35, 5),
    'RDD': (20000, 2000),
    'TRX': (5000, 2000),
}

class BittrexSocket1(BittrexSocket):
        def __init__(self, tickers, url=None):
            super(BittrexSocket1, self).__init__(url)
            self.tickers = tickers
            self.cnt = 0
            for ticker in self.tickers:
                time.sleep(0.01)
                self.last[ticker] = time.time()
                self.query_exchange_state(['BTC-'+ticker])
                self.subscribe_to_exchange_deltas(['BTC-'+ticker])

        def on_public(self, msg):
            ticker = msg['M'].split('-')[-1]
            self.cnt += 1
            return

def main():
    ws = BittrexSocket1(tickers)
    ws.enable_log()
    try:
        while True:
            sleep(60)
            print datetime.datetime.now(), ws.cnt
    except KeyboardInterrupt:
        pass
    print 'exiting'
    ws.disconnect()
    sleep(10)

if __name__ == "__main__":
    main()
slazarov commented 6 years ago

I will test it now.

1.) How often does the issue happens?

2.) Is there any pattern in the average time between starting the code and the occurrence of the error? E.g every 10mins or so.

gzivdo commented 6 years ago

No, it happens just after network error. Sometimes its 2h, sometimes 6-8h

slazarov commented 6 years ago

I see, I will work on a fix and will push it to github in due time. Thank you for reporting the bug.

hamkline commented 6 years ago

capture I'm having issues with reconnection too. Mine lasts quite a bit, but at least once a day it breaks. I'm running on a server with 1GB RAM only though. I don't run connections on a loop, it happens once and then I call a function to take care of the data + rest requests from the same library you use. Also, I don't have the other errors, but I think I got a different one last time even though it was from reconnection also. I can take a print next time if it helps. (python 3)

slazarov commented 6 years ago

Its a temporary fix but do the following: update to the latest version from github and replace line 140-143:

            elif invoke == BittrexMethods.GET_AUTH_CONTENT:
                self.connection.corehub.server.invoke(invoke, payload[0])
                self.invokes.append({'invoke': invoke, 'ticker': payload[0]})
                logger.info('Retrieving authentication challenge.')

with

            elif invoke == BittrexMethods.GET_AUTH_CONTENT:
                self.connection.corehub.server.invoke(invoke, self.credentials['api_key'])
                self.invokes.append({'invoke': invoke, 'ticker': self.credentials['api_key']})
                logger.info('Retrieving authentication challenge.')
ayy1337 commented 6 years ago

screenshot_2018-05-13_12-17-40 I'm also having troubles on reconnection.. Seems like it blew through by VPS' available memory and was killed by linux oom-killer.

ayy1337 commented 6 years ago

Also, I notice that you have in the websocket_client.py put a variable 'self.threads = []', but when a thread dies or is finished it is not removed from this list, preventing it from being garbage collected (as there are still references to it) Honestly I can't even tell why the list is there at all, since it doesn't seem to be accessed anywhere..

ayy1337 commented 6 years ago

screenshot_2018-05-13_13-32-39 Managed to catch it in htop. What are all these threads doing?

slazarov commented 6 years ago

Hi, self.threads = [] is a leftover from v<1.0, so thanks for noticing it. I am aware of what might be causing the thread spam and will fix in due time. As always thanks for raising an issue but next time provide additional information such as the whole log, code snippet, python version and a few word explanation since its difficult to debug from a picture.

Is node.js installed together with cfscrape?

slazarov commented 6 years ago

Update the package to v1.0.4 through: pip install git+https://github.com/slazarov/python-bittrex-websocket.git@1.0.4

Theoretically this should fix it, waiting for your feedback after testing.

hamkline commented 6 years ago

Is v1.0.4 only difference in the code you mentioned for me above? Because I still got the error with that. If not let me know and I'll test with the new code and provide feedback.

Just in case I'll post it here. capture It's funny that it only shows those few lines, so if it was trying to reconnect a lot, in a loop or something, it should have more output, right?

slazarov commented 6 years ago

As I said I can’t comment on pictures without any code.

v1.0.4 is different than the code I posted, you can check the commits.

hamkline commented 6 years ago

As I said, I subscribe and then stick to a function with a while True that handles the data forever, just like in your example with the __init__ having this: self.ticker_updates_container = {} which haves msg value every time on_update.

I'll test v1.0.4 and provide feedback.

ayy1337 commented 6 years ago

As requested, here is a pastebin with full code, is pretty basic honestly.. https://pastebin.com/a5UvBZqU And here is a pastebin with the full output from running the above code: https://pastebin.com/LSN9QDMc

Python version was 3.5, don't believe node.js is installed no. Cfscrape is installed. (this is re: your first reply, I will update to 1.0.4 and do further testing)

slazarov commented 6 years ago

Hi, I will take a look at your logs. Have you updated to v1.0.4. If not, please do so and test.

Cfscrape requires nodejs to work. It’s not resulted in an error on your side so far because bittrex has disabled cloudflare currently. I recommend installing node.js.

On May 13, 2018 at 6:42 pm, <ayy1337 (mailto:notifications@github.com)> wrote:

As requested, here is a pastebin with full code, is pretty basic honestly.. https://pastebin.com/a5UvBZqU And here is a pastebin with the full output from running the above code: https://pastebin.com/LSN9QDMc

Python version was 3.5, don't believe node.js is installed no. Cfscrape is however.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub (https://github.com/slazarov/python-bittrex-websocket/issues/61#issuecomment-388636054), or mute the thread (https://github.com/notifications/unsubscribe-auth/AbVUGwXHjL00m2il0eHFh8M3Stq6KObjks5tyFRvgaJpZM4T0GNn).

ayy1337 commented 6 years ago

screenshot_2018-05-14_02-04-47 My bad, it appears to have had node.js I was just ignorant of the appropriate command (i was trying 'node' and 'npm')

Edit: I will update to 1.0.4 now and do further testing, haven't done so yet Edit2: Have updated to new version, will post again with logs and whatnot when test completes

ayy1337 commented 6 years ago

Okay so I updated and ran the same script again with the new version, have the output from my tests. This paste I disconnected the network connection manually for 60 seconds and reconnected it. https://pastebin.com/V0bZHpY9

For the second test I let it run until it ran into an issue on its own, https://pastebin.com/F9y0hLhm

In neither case did it successfully reconnect after the error

hamkline commented 6 years ago

I got that too, here's a print just in case it helps: capture It was in the same function I mentioned earlier, it's the one that handles self.ticker_updates_container and nothing more. Again, same place under a while True loop that other errors occurred. My program didn't really end with killed this time though, probably the socket disconnected and it kept handling the same data on self.ticker_updates_container till I woke up.

slazarov commented 6 years ago

Hi, the reason for the error is that exceptions in Python2 have a 'message' attribute whereas exceptions in Python3 have 'args' instead. This is fixed. Do:

pip install git+https://github.com/slazarov/python-bittrex-websocket.git@1.0.5

@ayy1337 I have added TimeoutError as a new exception in the reconnection logic. I reckon the exception is from urllib3.exceptions import TimeoutError is the correct one. Please do your tests and update me.

As a whole, I have disabled the auto-reconnection for unknown errors since I prefer to add them when they appear and after I've become acquainted with them.

ayy1337 commented 6 years ago

For printing exception info I usually use (in 3.5 etc.) "{} {}".format(type(e).__name__, e) You don't need .args or .message Edit: This works on python 2.7 as well.. screenshot_2018-05-15_11-31-14

P.s. the url git+https://github.com/slazarov/python-bittrex-websocket.git@1.0.5 doesn't actually work since the update wasn't tagged with a version on git. screenshot_2018-05-15_10-43-08

slazarov commented 6 years ago

Hi @ayy1337, thanks for the suggestion. Also my bad, I pushed directly to the master so you can update through: pip install git+https://github.com/slazarov/python-bittrex-websocket.git

ayy1337 commented 6 years ago

screenshot_2018-05-17_17-11-13 Okay so running the tests again on version 1.0.5, this first pic is from just letting it run until it ran into troubles; the reconnect seems to have worked this time without blowing up the memory which is good, though it failed to resubscribe to the account deltas

screenshot_2018-05-17_17-14-25 This second pic is by forcibly disconnecting the internet connection, again for around 60 seconds before reconnecting.

slazarov commented 6 years ago

Can you paste the contents of websocket_client.py/_handle_subscribe method?

slazarov commented 6 years ago

I wasn't sure which package was raising the TimeoutError. Initially I thought it was urllib3.exceptions but probably it's the one from the built-ins. I have pushed the commit to v.1.0.6.

ayy1337 commented 6 years ago

Can you paste the contents of websocket_client.py/_handle_subscribe method?

Sure (1.0.5) screenshot_2018-05-19_12-02-35

ayy1337 commented 6 years ago

Updated to 1.0.6, tried the forceful disconnection test again and this time it reconnected successfully after the connection was restored :) screenshot_2018-05-19_12-14-10

I will run the test where I let it naturally reconnect now, though this can take some time.

hamkline commented 6 years ago

capture I got this on v.1.0.5, just naturally letting it run. I don't think there was any dcs by my part. I had connections to subscribe_to_summary_deltas() and authenticate() Gonna update to v.1.0.6 and see if it happens still.

ayy1337 commented 6 years ago

screenshot_2018-05-20_12-29-31 Okay so the natural disconnect finally happened, this one is also on 1.0.6. Looks like it again had troubles subscribing to the accountdeltas..

slazarov commented 6 years ago

Were you getting any data between 00:20 and 12:20?

ayy1337 commented 6 years ago

Hmm I didn't notice that actually.. appears to be a gap, possibly it didn't succeed in subscribing to summarylitedeltas?

ayy1337 commented 6 years ago

screenshot_2018-05-20_16-32-37 So, this is basically just conjecture on my part, but is it possible that it was trying to join a running thread here, and because it had no timeout, it simply blocked for 12 hrs waiting for the thread to die? Think it might be a possibility because the last thing it printed before the gap in the log is "Initiating reconnection procedure"

slazarov commented 6 years ago

Yes, that's what I am investigating currently. It seems SOCKET_CONNECTION_THREAD failed to quit somewhere in _signalr.py/start(self).

slazarov commented 6 years ago

I've made some changes to ensure SOCKET_CONNECTION_THREAD quits. Theoretically, this should ensure c2.Authenticate doesn't fail. I have also put some error traps to see if the Authenticate gets corrupted somewhere.

pip install git+https://github.com/slazarov/python-bittrex-websocket.git@1.0.6.1

Changes: https://github.com/slazarov/python-bittrex-websocket/commit/699175bcabe38acfec871f0829e42b9d5a836f50 https://github.com/slazarov/python-bittrex-websocket/commit/715cdd803a4e45a6f2baab70fcc889dded2c30b2

ayy1337 commented 6 years ago

Okay, will rerun the test (natural reconnect) with the new version

ayy1337 commented 6 years ago

screenshot_2018-05-21_01-29-41 Captured two reconnection events, both appear to have worked as intended. I will keep running the script for a while though

ayy1337 commented 6 years ago

screenshot_2018-05-21_14-59-17 Two more, looks like reconnecting is working fine now

gzivdo commented 6 years ago

TimeoutError absent in python 2.x, its not built-in. Also i have this error:

2018-05-21 20:46:23 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [QueryExchangeState] for [BTC-ZCL].
2018-05-21 20:46:23 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [SubscribeToExchangeDeltas] for [BTC-ZCL].
No handlers could be found for logger "websocket"
2018-05-21 20:46:24 - bittrex_websocket.websocket_client - ERROR - 
Unhandled Exception = <type 'exceptions.AttributeError'>, Message = <'NoneType' object has no attribute 'split'>.
slazarov commented 6 years ago

Will check and push it.

On May 21, 2018 at 8:45 pm, <gzivdo (mailto:notifications@github.com)> wrote:

TimeoutError absent in python 2.x, its not built in

— You are receiving this because you commented. Reply to this email directly, view it on GitHub (https://github.com/slazarov/python-bittrex-websocket/issues/61#issuecomment-390729070), or mute the thread (https://github.com/notifications/unsubscribe-auth/AbVUG0AYHUe6-__dit1fW2ucj0tCEpsVks5t0v0ggaJpZM4T0GNn).

slazarov commented 6 years ago

Did you get any errors? If so, post the log so that I can check what exception is being raised instead.

On May 21, 2018 at 9:27 pm, <Stanislav Lazarov (mailto:s.a.lazarov@gmail.com)> wrote:

Will check and push it.

On May 21, 2018 at 8:45 pm, <gzivdo (mailto:notifications@github.com)> wrote:

TimeoutError absent in python 2.x, its not built in

— You are receiving this because you commented. Reply to this email directly, view it on GitHub (https://github.com/slazarov/python-bittrex-websocket/issues/61#issuecomment-390729070), or mute the thread (https://github.com/notifications/unsubscribe-auth/AbVUG0AYHUe6-__dit1fW2ucj0tCEpsVks5t0v0ggaJpZM4T0GNn).

gzivdo commented 6 years ago
Exception in thread SocketConnectionThread:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build/bdist.linux-x86_64/egg/bittrex_websocket/websocket_client.py", line 159, in _connection_handler
    except TimeoutError as e:
NameError: global name 'TimeoutError' is not defined

And then if i comment timeouterror code i get:

No handlers could be found for logger "websocket"
2018-05-22 08:47:24 - bittrex_websocket.websocket_client - ERROR - 
Unhandled Exception = <type 'exceptions.AttributeError'>, Message = <'NoneType' object has no attribute 'split'>.
Auto-reconnection is disabled for unhandled exceptions.
Report to https://github.com/slazarov/python-bittrex-websocket.

And nothing happening after.

hamkline commented 6 years ago

1 2 v1.0.6 - py3 The 2 pictures are from the same situation, just had to cut them in 2.

gzivdo commented 6 years ago

Still no fix? With py2.x just not work at all now.

slazarov commented 6 years ago

I’ve been away for the past 2 weeks, will look into it in the weekend.

slazarov commented 6 years ago

Do a: pip install git+https://github.com/slazarov/python-bittrex-websocket.git@1.0.6.1

It should fix the TimeoutError. Regarding: Unhandled Exception = <type 'exceptions.AttributeError'>, Message = <'NoneType' object has no attribute 'split'>.

Are you sure that this is not part of your code? Can you repost again your code.

gzivdo commented 6 years ago

yes, sorry.

gzivdo commented 6 years ago

The error bacause beta api returns with QueryExchangeState 'M' not None, but https://socket.bittrex.com/signalr returns None. Will use beta for now.