pmaji / crypto-whale-watching-app

Python Dash app that tracks whale activity in cryptocurrency markets.
MIT License
605 stars 137 forks source link

Hosting #16

Closed pmaji closed 5 years ago

pmaji commented 6 years ago

Looking for long term and speedy options for hosting this. I tried Heroku to no avail. The present plan now is to move to AWS long-term, but I'm open to other suggestions in this thread.

pmaji commented 6 years ago

FYI we are stilling dealing with a problem related to the present hosting. After a short while of working without a problem, the app ceases updating with new prices. @CrackLord (the webhost) and I don't believe it has anything to do with throttling from GDAX because we are well within the limits they list on their website. @CrackLord could you perhaps give a brief description of how you are hosting it presently so that others might guess as to the issue. My thought is that @theimo1221 might have a hypothesis.

CrackLord commented 6 years ago

@CrackLord could you perhaps give a brief description of how you are hosting it presently so that others might guess as to the issue.

It is running on a VPS. It is simply being run by running pyhthon app.py using a systemd service and then reverse proxying requests to the app server using nginx.

It is a simple setup, I don't think that this would be related to the issue.

theimo1221 commented 6 years ago

I guess the problem is within the sheer amount of requests made by the website, after a while servers start refusing requests. Changes made in #40 should definitly change this.

@CrackLord could you update the version on your server after @pmaji approved the changes?

Thanks and Greets

CrackLord commented 6 years ago

@CrackLord could you update the version on your server after @pmaji approved the changes?

Will do. Error logging on the GDAX API endpoint would be useful too, to see what's going on. The logs are somewhat full of crap at the moment though because of the default http Python server logging so it would be worth disabling that I think.

theimo1221 commented 6 years ago

My changes are combined in #41

@CrackLord for testing stability you could just copy my code from the link above. It includes all changes I made today.

I´ll take a look into logging the endpoint but I guess thats not the problem

pmaji commented 6 years ago

@CrackLord you are good to integrate the new version from Master and start hosting. Thanks to @theimo1221's improvements, this should be sustainable and avert the refreshing-stopping problem previously experienced.

CrackLord commented 6 years ago

Done, @pmaji

pmaji commented 6 years ago

FYI @CrackLord @theimo1221 the new version appears to already have de-synced. Back to the drawing board I suppose.

theimo1221 commented 6 years ago

I just took a look into it.

It´s not current data, but the request to the server and the response seem to be totally okay.

So we desync but on App side, not on client side.

So it could be something in @CrackLord settings or GDAX side, blocking the data....

I think I´ll add a time to the data to be able to see the last correct refresh from gdax

theimo1221 commented 6 years ago

I just had ideas what might be the problem and it could be very easy:

  1. He uses a VPS. Others from the same hosting provider on the exact same hardware might as-well be calling data from gdax resulting in api Errors.
  2. We call the Api multiple times. It would be better to call it once and cache data for other pairs.

Will fix this

theimo1221 commented 6 years ago

Added some stuff in #48 We had following problem: He was updating 8 pairs one after another. Then waits 5s and again pulling 8 pairs.

This is problematic with the Api due to many request at short timeframe.

I changed it to 1 request 1s pause 1 request ....

pmaji commented 6 years ago

@theimo1221 great catch. I glossed over that given our previous change to 1 data pull and neglected to realize it might still be treated by GDAX as one after the other leading to throttling. Let's see if this one works to solve it.

theimo1221 commented 6 years ago

@CrackLord have you updated version on your server?

CrackLord commented 6 years ago

Yeah, I have updated it now.

On 19 Feb 2018, at 22:14, Thiemo Hoffmann notifications@github.com wrote:

@CrackLord https://github.com/cracklord have you updated version on your server?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pmaji/crypto-whale-whatching-app/issues/16#issuecomment-366804221, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad8kTAKlEXNzNnnyLPuocI1MxNZJL6Mcks5tWeQ8gaJpZM4SH4Tf.

pmaji commented 6 years ago

Okay so I ran a local instance of the app as well for testing purpose and both @CrackLord 's version and my own stopped refreshing at 19:17.

I started my version at 17:55:44 (similar to when CL started his).

@theimo1221 Any ideas on what is causing the refreshes to stop over an hour after the initializing?

My one thought presently would be we could push the wait time a bit further back and retest.

theimo1221 commented 6 years ago

@pmaji Cracklords stopped at 19:17 aswell. But I can still connect to it and get the data. So in my opinion it can´t be a critical error (Critical error would lead to end of main program, wich terminates server thread). We should get some debugging from gdax. Maybe we can catch the error from Gdax and prevent it by an one time additional waiting....

theimo1221 commented 6 years ago

Wait it´s the other way around, refreshing is a thread, while Server isn´t guess I spot the problem.

theimo1221 commented 6 years ago

Current Situation: Once we loose connection to Gdax our refresh thread terminates on error, while server keeps running, cause he is on main thread.

Solution in #52: Own thread for both Server and Refresher. Main program acts as a watchdog and if it detects a dead thread it restarts the thread.

I tested this by unplugging ethernet cable, (wich causes exception due to gdax pull), and watchdog restarted thread fine. Site get´s new data!

So @pmaji please accept pull and @CrackLord please update Server. I´m sorry for not having this thought before, but both of you getting stuck at excact same time, gave me this idea!

theimo1221 commented 6 years ago

@CrackLord please update your site again

CrackLord commented 6 years ago

@CrackLord please update your site again

I have already been #52 since it was merged :)

theimo1221 commented 6 years ago

@CrackLord #54 was merged 5 hours before my post. You are using an outdated version ;)

CrackLord commented 6 years ago

@CrackLord #54 was merged 5 hours before my post. You are using an outdated version ;)

I have updated now again to be sure it's on the latest version.

pmaji commented 6 years ago

@CrackLord go ahead and re-pull. You'll also need to download the two new files (the new .py [and maybe the .js if you want as well]) if you haven't already. Let me know whenever it's done :)

CrackLord commented 6 years ago

Done @pmaji

theimo1221 commented 6 years ago

Ty @CrackLord, but is your server down? Can´t reach it

CrackLord commented 6 years ago

There seems to be a problem with the latest version @theimo1221. It doesn't work for me locally either.

Edit: I've rolled back to 623d21d5012f7a5d0b457aadf9a80ba86acece54 until there is a fix.

theimo1221 commented 6 years ago

Hm, it was both for @pmaji and me. Did you download gdax_book.py? He´ll approve the newest pull today let´s see how that one is running for you

theimo1221 commented 6 years ago

@CrackLord Talked with @pmaji and newest Pull with Sidebar is working completely fine on our machines

CrackLord commented 6 years ago

I don't see any graphs when using the latest version at all. I'm not sure why this would work for you but not for me.

image

theimo1221 commented 6 years ago

@CrackLord Wait a little bit or did you forgot to download gdax_book.py?

The process now needs like 40s before data are ready Edit: To be precise: 11 4s for websocket starting 11 3s for data calculation So after 80 seconds all should be fine 😉

CrackLord commented 6 years ago

@CrackLord Wait a little bit The process now needs like 40s before data are ready

I see what the issue is. Your latest changes are suddenly very resource intensive. It's using 100% of one core of my CPU on my Macbook and now taking up 256 MB of RAM. This is causing the app to crash on the server, since the server is not as powerful as my laptop.

Edit: I believe the intensive CPU usage is caused by this. This is essentially using up the entire core the script runs on to the max limit because there is no throttling on the loop.

theimo1221 commented 6 years ago

@CrackLord on my pc it takes like 14% CPU

You can add a sleep there sure, but others are not sleeping aswell, due to the fact, we can constantly pull by having Websockets. But I´ll maybe rewrite Code to let Calc wait on successfull prepare first.

CrackLord commented 6 years ago

@CrackLord on my pc it takes like 14% CPU

Yes but how many cores does your CPU have? That is probably close to 100% of one core.

Listening and updating on a websocket is fine but a while loop without a throttle will naturally max out the CPU core it's running on. Without a throttle then it will run as fast as the CPU allows it, which means maxing out the resources it's given.

theimo1221 commented 6 years ago

@cracklord I understand your point but to serve live data we have to be as fast as possible.

I'll rewrite the calc and preparation to wait for new data which will result in sleeping time and even up updating cause we start calculation at correct time

CrackLord commented 6 years ago

I understand your point but to serve live data we have to be as fast as possible.

Yes but it doesn't make sense to do that in a while loop, it should be done as new data comes in for that pair.

Also imo the calculations should be done on the frontend using JS, not on the backend. It makes sense to make the client do all of that work. Streaming the data to the client via websocket.

theimo1221 commented 6 years ago

Will do that tomorrow

For now I guess it'll be fine if you add some sleeps on your side

pmaji commented 6 years ago

Feel free to give the sleeps a try on your end @CrackLord and let us know how it goes. If it's still overloading you can revert to a slightly older version of the code until we chill out the CPU usage :)

theimo1221 commented 6 years ago

I did some logging on this:

image

As you can see the data sequence difference for some pairs are much higher due to computation time consumption. Resulting in differences of total refreshes, but faster refreshes for pairs with less data.

But it´s not like 1 sequence every x ms, so i tried it with a global var set to 30s for each Pairs refresh.

image

Now all have same amount of total refresh and CPU load is less.

theimo1221 commented 6 years ago

@CrackLord if you update don´t forget to take #82 with it, cause it fixes a bug wich can lead to constant restarting of a websocket

CrackLord commented 6 years ago

@CrackLord if you update don´t forget to take #82 with it, cause it fixes a bug wich can lead to constant restarting of a websocket

I've updated to master, including the change from #82 but it's still killing the server. It's possible that it may be memory usage, the server I'm using only has 256 MB of RAM and it's using all of that and then starts going into swap memory.

If you guys want @pmaji, I can get us set up on a DigitalOcean VPS, the smallest one is much more powerful than my spare VPS and there's a lot of room to scale up if necessary to extremely overpowered servers. I messaged @pmaji on Reddit about this already. Once I get it set up there, it should be relatively maintenance free and we can also enable automatic backups of the server so it can be restored with the click of a button if anything goes wrong.

The current amount of donations should be able to sustain hosting, SSL and a domain for at least a year, if not more.

At the very least I'll get it set up and it's only pay by hour, so if it ends up not working out we don't lose any investment really. What do you think about that @pmaji ?

pmaji commented 6 years ago

@CrackLord I think it's definitely viable. At a minimum I'd like to test it to see what kind of a baseline cost estimate we get for just one hour. Feel free to do that but don't take on anything larger immediately. Still have to discuss performance differentials with @theimo1221 before deciding.

CrackLord commented 6 years ago

@CrackLord I think it's definitely viable. At a minimum I'd like to test it to see what kind of a baseline cost estimate we get for just one hour. Feel free to do that but don't take on anything larger immediately. Still have to discuss performance differentials with @theimo1221 before deciding.

The costs are very predictable. They would essentially be $0.007/hr if the lowest specced VPS is sufficient, which I believe it would be.

theimo1221 commented 6 years ago

In the following the description I send @pmaji

Old version:

  1. Pair 1 1.1 get data from gdax for pair 1.2 calc graph 1.3 recalc send data
  2. Pair 2 2.1 .... 2.2

So before pair 1 refreshes all other pairs have to be refreshed And each takes minimum 500ms to prevent gdax limitation and for some pairs calculation takes 3-5 s

New version:

Thread 1: Server Thread 2: sum up data of cache Thread 3-14: websocket for each pair Thread 15-26: pull from websocket and calc of data for each pair Thread 27-38: prepare of graph data

All those are running parallel So if pull/calc + prepare just needs 5s than that pair is updated every 5s If it takes 20s than this pair is refreshed every 20s while others keep refreshing at their speed Of course having more threads and load will slow down each thread if cpu can't take more load but until that point is reached it's a huge plus

I hope this gives a good understanding of difference between both methologies

And to be scaleable in future we have to use new method

Imagine we have 100 pairs average needing 3s to refresh and calculate data. With old method this would result in each pair having new data every 5 minutes

With new method 100 pairs (resulting in 302 Threads) Would result in cpu load wich slows all down but still it will be around 30-45s for slowest pairs and many would be faster. Resulting in kind of live data

pmaji commented 6 years ago

Yep, I think that's well understood. It seems to me that this is the way we want to go, as long as it doesn't wind up being cost prohibitive.

@theimo1221 could you double check by looking over master to be sure that this is the most up to date version of the threading methodology that is optimized to strike the right balance of speed and CPU load?

@CrackLord; upon confirmation, feel free to test the new Master for the shortest time period possible (I think an hour). That can give us a baseline of price expectations. I'll shoot you ETH to pay for whatever it winds up being; just dm me the invoice. I'll need to build into my forecast thereafter what the price change will be given user load, so if the cite gives an estimate of that I'd appreciate it as well.

theimo1221 commented 6 years ago

@pmaji Please pull #83, which changes Js link back to cdn version wich is updated. (No need to use the developer link, this could even lead to an error, if too many use it.)

@CrackLord Experiment with following line. desiredPairRefresh = 30000 # in ms

The lower it is, the better is it regarding speed of at least some pairs, the higher it is, the less cpu load it takes.

pmaji commented 6 years ago

@theimo1221 @CrackLord pulled. Feel free to do some testing on your end @CrackLord to see what is optimal.

theimo1221 commented 6 years ago

@CrackLord What is current state? @pmaji It´s now 10 days since my updates, wanna see them online!^^

theimo1221 commented 6 years ago

@pmaji What is the current state here?

CrackLord commented 6 years ago

Hi guys, sorry I've been particularly busy these last few weeks. I've also spent some time thinking about this project and how it could be improved. My following suggestion is going to be pretty drastic but I think it'll be the ultimate way to create this app.

I think you should move away from using Python or having any backend and only use client-sided JavaScript. This solves a lot of issues.

First issue is performance, all of the calculations, requests/websocket listening will be done on the client-side directly with GDAX, so all of the hard work is put entirely on the client.

Second issue is hosting, hosting static files requires literally near 0 cost. You can host on Github with SSL from CloudFlare for free, all you need to pay for is a domain which is $9,99 a year for a .com domain at most.

Third issue is making it look good, with the current Python libraries and architecture being used, it's quite difficult to edit the HTML to make it look good and add custom stylesheets etc. If this were simple HTML + JS file(s) then it would be a lot easier to manage the look and feel of the site.

That's my 2 cents guys. I know that this would be a pretty drastic change but the most important part of this code is really the algorithm which determines which orders are from the same user. That part can easily be ported to a new JS codebase.

I really do think that this would be the ideal way to go forward and don't see a reason not to do it, other than the fact that it would throw out a lot of the existing code.

theimo1221 commented 6 years ago

@CrackLord I understand and appreciate your points, but disagree on 2 points:

Design

  1. iFrame Putting our content in an iFrame gives us all possibilities Html/ JS/ Css offers
  2. change content with js. As already in newest version I add styles and content with JS on clientside

Performance:

Having all Api´s and calculations on clientside is not useable depending on internet connection and performance, especially on mobile devices. And if me add more exchanges, some doesn´t offer web api, or atleast limit it