jl777 / SuperNET

57 stars 222 forks source link

Serialize required requests internally #562

Open lukechilds opened 6 years ago

lukechilds commented 6 years ago

If I make a few concurrent connections to marketmaker the connections are just dropped which makes it hard to know whats going on or provide useful information in a GUI implementation.

It doesn't have to be a huge amount of requests just a small amount happening concurrently. For example making 4 simultaneous requests to enable Electrum coins regularly results in one of the requests being dropped which gives an ugly net::ERR_SOCKET_NOT_CONNECTED error in Chromium which doesn't give a lot of insight into what's going on.

If we queue all our api requests and limit them to only execute one at a time everything works but there are pretty large performance issues with doing this in our GUI.

Is this expected behaviour or a bug? Are we supposed to be able to issue commands concurrently?

jl777 commented 6 years ago

please serialize requests

lukechilds commented 6 years ago

@jl777 We're now queueing requests one at a time to resolve this issue.

However it might make more sense for you to implement the queue logic on your end inside marketmaker. That way it only needs to be implemented once instead of in every single client. And clients that aren't aware of this limitation won't get strange errors.

Another benefit of queueing on your end is that if you know internally which commands need to be queued and which are safe to be called asynchronously then you can bypass the queue for those commands. This can result in a pretty huge performance win for GUI applications where they may be waiting on a very quick request to show a certain UI element but it's getting blocked by a slow query that's in the queue before it.

jl777 commented 6 years ago

all good points. however the current version has been tested, under load and it is working. I am reluctant to make such a big change as it would require requalifying it, especially under load. so this is a wontfix for now, but certainly a good feature for version 2.0

lukechilds commented 6 years ago

Understandable. Can we reopen this issue to keep it tracked?

Maybe under a different title such as "Serialize required requests internally".

jl777 commented 6 years ago

ok, go ahead

lukechilds commented 6 years ago

Updated the title, doesn't look like I have permission to reopen.

lukechilds commented 6 years ago

Just been speaking to @pondsea and he said if I could give you a little more insight into how this is causing issues you might be able to implement some async logic.

So to give you an example, there's no socket interface for things like balance or UTXO count that we can subscribe to, so to provide a good user experience on the exchange page and keep all data in sync we need to poll mm regularly for the data. Currently once a second we check the UTXO count for both base and rel, the address balances and the orderbook.

However due to problems mentioned previously in this issue we are now adding all API requests to a queue and running them one at a time. Any slow requests (listunspent is the usual suspect) end up blocking up the queue and results in us getting a backlog of requests. This makes the entire GUI feel very slow and buggy.

Lots of these requests are totally unrelated so should be able to be executed asynchronously. For example it doesn't make sense that a slow UTXO lookup should delay us from updating the order book or enabling a new electrum coin.

It would be ideal if you could implement the queueing on your side because you know what information depends on what. We could then make as many requests as the UI needs in parralel and if you can return the data instantly (like portfolio which I believe just returns data straight out of memory) you respond instantly, and if it's being blocked by something else or takes a bit of time (like listunspent) you just don't respond to that request until you're able to.

That way a single slow request won't hang the entire app.

Does that make sense? Is this something you can implement? It's causing pretty major usability issues with our GUI. If you can implement this we just need to remove the queue logic from our API client and these performance issues will just disappear.

TLDR: We should be able to send as many requests to mm as we need. If some can be done in parallel and some need to wait for something else to finish then that should be handled internally by mm. So we just send all the requests async and get the responses back whenever you have it available.

jl777 commented 6 years ago

What you are talking about would require rewriting a significant part of mm and practically speaking will need to wait until v2.

However, I am wondering why you dont just query electrum servers directly? a lot of the mm api functions are just convenience functions and as such they are not implemented with massive performance. If you want a fully asynchronous everything, then split out all the electrum calls and do them however you wish. No sense to use the limited mm variant when you can directly query the SPV servers, however many you want to, using whatever logic you want.

I think I can safely add queuing of adding an electrum coin if that is a specific slowness that you need fixed. listunspent you surely can do directly and bypass the serialization.

lukechilds commented 6 years ago

The Electrum stuff was just an example, but yes, moving all Electrum/balance/utxo stuff internally is definitely something we can look into to reduce the load on marketmaker.

However one concern of mine is that then we have two separate electrum implementations which could result in data inconstancy. For example I think marketmaker will show the balance as the address balance minus any UTXOs that are pending swaps while obviously Electrum will just give us the full balance. Maybe we can query marketmaker for which UTXOs are free only use that data from electrum then we're back at squere one polling marketmaker regularly for up to date UTXO info again. (not sure if my assumption of marketmaker balances is 100% correct there).

There are also issues if our code or your code fails over to a different electrum server they could be at different heights meaning the UI thinks we have one balance but marketmaker thinks we have another.

The inconsistencies that could occur with two isolated Electrum instances in the same app could cause very unpredictable behaviour. But still, definitely something we can look in to.

However even if we manage to reliably move all balance/Electrum stuff over to our end, we still need to poll marketmaker regularly to display live order book data, and live swap state.

So we'd still need a reliable way poll marketmaker regularly without it crashing.

One alternative solution might be for you to provide a socket that we can connect to and subscribe to events. So for example we could connect and subscribe to the KMD/CHIPS orderbook and you just push data over to us when orders are added/removed. Same with swap status. Although maybe that's even more work than just making the current api methods async, I don't know.

jl777 commented 6 years ago

the problem with the electrum support as it is is due to its support being added at the last minute and wasnt part of the initial design, so unfortunately all the fancy things will need to be for 2.0. I am worried that if each client makes 10 SPV requests that increases the overall load on the servers 10x, which will make people want to issue 20 SPV requests, ... not a good feedback loop

the balance calcs mm uses is just a sum of unspent utxos, for swaps in progress they are updated as spent in electrumX (at least with the latest versions that made 0conf support much better)

I am not seeing any crashing based on polling mm. if you have a test setup that crashes mm on polling, I need to fix that. seems better to fix the few remaining bugs than to rearchitect major dataflows at this late stage.

So let us try to be specific with reproducible bugs, preferably at the cli level, ie. a script with a bunch of curl requests to the mm api that causes a crash.

lukechilds commented 6 years ago

the problem with the electrum support as it is is due to its support being added at the last minute and wasnt part of the initial design

That's fair enough, I understand.

I actually had an idea re electrum support that could solve some of the concerns I had.

We could implement all the groovy Electrum logic on our end in our app. We handle it robustly, connecting to multiple servers, using the first response we get, so it’s decentralised and fast, resistant to a single electrum server going down.

Then we open a local socket for each electrum coin that just proxies everything through our Electrum logic and mm connects to that. So as far as mm is concerned it's just connected to a single Electrum server running on localhost. In reality it's connecting to our proxy and will run through our routing logic that will ping multiple servers etc. It'll be totally transparent as far as mm is concerned and require no code changes on your end.

But our GUI and mm will both be using the same electrum servers routed via the same logic.

Does that seem like a good solution to you?

Re dropping requests I'll try and get a repro over to you.

jl777 commented 6 years ago

pretty clever! I like it.

pondsea commented 6 years ago

This came about due to a dream he had last night lol....

lukechilds commented 6 years ago

@jl777

From your message above: https://github.com/jl777/SuperNET/issues/562#issuecomment-365946221

I think I can safely add queuing of adding an electrum coin if that is a specific slowness that you need fixed

Is this still an option?

When the user logs in to the app we show them a loading screen while we wait for the mm process to initialise and set everything up. Currently adding electrum servers is a major bottleneck. If you could do this concurrently safely that would give a drastic reduction in login times.

Obviously safety is the priority, if there's a chance it's going to cause instability then it's not worth the risk.

jl777 commented 6 years ago

anything async could make things unstable. one question is why do you need to add all electrum servers before completing login? only one coin would be needed for login to have occurred

lukechilds commented 6 years ago

We have certain coins enabled by default and users can enable any extras manually.

These are all enabled and then we call portfolio before we show the user the dashboard so we can give them an accurate overview of their portfolio.

If we just enabled a single coin at login their portfolio would appear to be missing funds and would cause confusion for users.

Obviously better Electrum support for v2 or moving al Electrum queries in app is the ultimate solution, but just working with what we've got available at the moment.

jl777 commented 6 years ago

on startup the prevaccess time is 0, so it would issue the SPV call. I could start things out so it would get the balance from the cache, but of course if balance has changed it would be showing the old values at first

lukechilds commented 6 years ago

The issue is that we initialise each electrum server after each other and each call takes quite a long time. Obviosuly if the users has 20-30 currencies enabled it's gonna take a very long time to login.

If you can make it async it should instantly solve the issue. If not then anything you can do to improve the electrum enabling speed is great.

So with your last comment, does that just mean the balances would be wrong for a few seconds? So if we're polling portfolio every seconds it would all sync up pretty fast?

jl777 commented 6 years ago

yes changing the initial state of the usecache logic would be in the safe category. messing with electrum async, cant guarantee it wont destabilize

I will add the using cache files first logic, it wont always speed it up but should avoid the long time for the initial setting of cache

jl777 commented 6 years ago

OK, its there in dev branch also I see the cache refresh time is 13 seconds, not 30

lukechilds commented 6 years ago

Awesome, thanks James!

Btw, does this mean that however long it takes to sync the balances properly will be the same amount of time it would of taken to login without the optimisation?

Or will this get us to a logged in state with synced balances sooner?

jl777 commented 6 years ago

I am not 100% sure, I hope the latter

lukechilds commented 6 years ago

Ok, thanks, will test.

lukechilds commented 6 years ago

@jl777 Just tested on the latest dev and it seems to have messed up the portfolio command. Not sure if that's from the the electrum changes or portfolio changes (https://github.com/jl777/SuperNET/commit/858fbf769042e0042f49bd93763a411161394007#commitcomment-28912115)

But I have eight currencies enabled in electrum mode and when I run the portfolio command the response only contains three currencies. Even after leaving mm running for a ~10 minutes and polling portfolio every second.

lukechilds commented 6 years ago

Looks like was only the currencies that had balances previously that are now showing up. But I even tried sending coins to a new currency and it's not showing. Allthough that seems like a possibly unrelated issue as I'm not seeing the new balance in a previous version of mm either.

lukechilds commented 6 years ago

Possibly related: https://github.com/jl777/SuperNET/issues/833

jl777 commented 6 years ago

I guess i need to revert the portfolio caching

jl777 commented 6 years ago

its in dev now without the caching of portfolio, it was issuing a getbalance in the portfolio call, which would trigger a listunspent. so a portfolio will again be making SPV calls, but the listunspent cache will make it not so slow.

lukechilds commented 6 years ago

Working, thanks!