ib-api-reloaded / ib_async

Python sync/async framework for Interactive Brokers API (replaces ib_insync)
BSD 2-Clause "Simplified" License
429 stars 68 forks source link

positions request timed out on connect() #84

Open skister opened 3 days ago

skister commented 3 days ago

I occasionally get positions request timed out errors when connecting to the gateway. This may be related to issue #54 and issue #82, however because of those issues I do not request executions on connect() and still sometimes see this error.

E positions request timed out ib.connect(ibgateway, 4001) fetch=StartupFetch.POSITIONS|ORDERS_OPEN|ACCOUNT_UPDATES connected after 30.05s

Restarting the ibgateway fixes the issue.

One issue I encountered was that ib_async does not flag the error other than a log message, so I use this workaround to detect the issue and I exit my app when it happens. It should set a flag on failure.

# Ensure none of the requests timed out
# This is a bit hacky, but the only way I found to check
# if there was an error outside of the ib_async code
for key in ib.wrapper._futures:
  fut = ib.wrapper._futures[key]
  if fut.cancelled() or fut.exception():
    status = 'cancelled' if fut.cancelled() else fut.exception()
    log.fatal(f"ib.connect() succeeded, but fut request {key} failed {status}, exiting()'")
skister commented 3 days ago

This issue even happened on a Sunday with a newly started ibgateway and no trades or reuse of client_id

mattsta commented 3 days ago

Thanks for the details!

There is the option to pass raiseSyncErrors=True to the connect method which would cause it to raise an exception.

https://github.com/ib-api-reloaded/ib_async/blob/38cf54a66a4daefbd3fd1d7476381f0d178a8198/ib_async/ib.py#L2072-L2073

You can also adjust the timeout with a parameter too, but it does seem 30 seconds as shown above should be long enough.

Your workaround for detecting errors does seem like a lot of extra work. The startup fetches are all run here with timeouts, so they shouldn't be allowed to end up "stuck" just floating around like that unless the IBKR gateway doesn't return a final "positions complete" call.

https://github.com/ib-api-reloaded/ib_async/blob/38cf54a66a4daefbd3fd1d7476381f0d178a8198/ib_async/ib.py#L2053-L2061

The IBKR TWS API used here is 100% async (versus something like their web api), so our client sends "get positions" to the gateway, then our client just has to sit and wait for two things: all the positions to be sent, and then a final "positions are complete" response. But there is no direction single command which is "get positions, read positions, positions are now complete" — it's always multiple replies which apparently don't always come back from the gateway (so if we never get the "positions are complete" message, our client has no way of knowing the gateway is done sending position data).

It can be useful to read the gateway client tab logs on startup to see what commands are being sent (lines with ->) and returned back to the client (lines with <-). If you can see the request is reaching the gateway, but the gateway does respond with data, then the problem is somewhere in the client blocking too long somewhere not letting another read happen.

also all the other standard practices help like making sure the gateway has enough java memory configured, disabling the "expose entire trading schedule" option, etc.

It's really a user choice whether to accept a failed data load on startup: the failure of the startup sync doesn't mean the connection is broken, so you can have a failed sync still return a working connection (the automatic sync is just calling user APIs to load existing data into cached objects first, but you can call them yourself anyway if you need to).

Though, I did just notice a bug where StartupFetch.POSITIONS isn't actually being used to decide whether to load position on startup (currently it always loads positions), so we should fix that one.

One potential thing to test: does connecting with a different client id help? By default client id 0 is "special" so I wonder if using any other client id would return different results (all clients can see all positions though, but all clients can't see all orders except for client 0).

skister commented 3 days ago

That is good to know about raiseSyncErrors.

I do disable "expose entire trading schedule". I recently doubled the memory on the ibgateway vm to 4GB from 2GB and haven't noticed a change in how often the issue occurs. The fact that it just happened on a freshly started VM with no trades or activity makes me suspect it is not a memory issue. When it occurs, it does happen with other client ids and restarting ibgateway is the only fix I have found.

I'll look at pulling the logs, but because this runs 24x5 on a headless VM, I don't know of an easy way to decrypt the logs without waiting until the next weekend.

I'm pretty sure that this is an ibapi issue where the position complete call is never returned, like the reqExecutions issue, and not a problem with ib_async. I have tried with 10-minute timeouts and it didn't help. I haven't used the ibapi code directly, but can work on reproducing it with that, then submit a bug report to IB API support.

mattsta commented 3 days ago

this runs 24x5 on a headless VM

oh that's good to know too. It probably has a vnc session exposed somewhere.

Probably worth checking the gateway version too. I just update mine every couple weeks because I assume they fix things? They haven't updated their changelog in a year or two, but every week the version number goes up (not sure if it's completely automated or actual changes though): https://investors.interactivebrokers.com/en/index.php?f=16454

the position complete call is never returned

Also can depend on time of day? If you get unlucky and try to connect around the 9pm IBKR service shutdown time nothing works for a while, and often nothing works on weekends for hours at a time too, but you mentioned it worked after a quick restart, so maybe not an exact service timing issue.

I haven't used the ibapi code directly, but can work on reproducing it with that

other than the async usage, their default python API uses the same method names and parameters for all basic operations, so it should hopefully be easy to clone the connection+positions behavior for quick testing.