hotosm / osm-tasking-manager2

Designed and built for Humanitarian OpenStreetMap Team collaborative emergency/disaster mapping, the OSM Tasking Manager 2.0 divides an area into individual squares that can be rapidly mapped by thousands of volunteers.
http://tasks.hotosm.org
Other
425 stars 156 forks source link

Tasking Manager site is sometimes slow or hangs #918

Closed aawiseman closed 7 years ago

aawiseman commented 7 years ago

It seems like over the past couple of weeks the site has been a lot slower to load -- when logging into OSM and when selecting a square to work on especially. I've noticed this on different networks in different places.

Nick-Tallguy commented 7 years ago

Using firefox on my linux mint laptop with an i5 processor - not slow anywhere else. I've notice that sometimes bottom left on my screen appears the text 'waiting for piwik' or similar - I've also noticed that piwik is now slow to deliver stats when I've been checking up on LearnOSM.
I haven't been able to reproduce this exactly - was about to work on http://tasks.hotosm.org/project/2307#task/364 but when I return to that square now it loads instantly.

CloCkWeRX commented 7 years ago

I've noticed this as well.

Looking around, fetching users.json is the slowest call, sometimes taking between 5-10 seconds image

It's loading ~60,000 usernames on task selection - presumably thats a lot more than just the people that worked on that task or project.

It might be worth doing an ajax call for the typeahead element limited to the relevant users

There's a few other things like failed requests appearing frequently, but it doesn't appear to affect functionality

[Sun Jan 01 2017 21:14:31 GMT+1030 (ACDT)] body 404 Not Found get /project/1886  Error: 404 Not Found get /project/1886 
    at n.Application.error (sammy-latest.min.js:8)
    at n.Application.notFound (sammy-latest.min.js:8)
    at n.Application.runRoute (sammy-latest.min.js:8)
    at n.Application._checkLocation (sammy-latest.min.js:8)
    at n.Application.run (sammy-latest.min.js:8)
    at Object.init (project.js:889)
    at HTMLDocument.<anonymous> (project.js:910)
    at i (jquery-1.12.3.min.js:2)
    at Object.fireWith [as resolveWith] (jquery-1.12.3.min.js:2)
    at Function.ready (jquery-1.12.3.min.js:2)
CloCkWeRX commented 7 years ago

Another slow ajax behaviour: Check for updates

image

All of these came back with {"update": false}. They triggered very frequently from UI interaction; potentially only executing once, waiting for that to finish then starting a new request with the correct interval might improve things (less load on backend).

pgiraud commented 7 years ago

@CloCkWeRX The performance issues are actually dependent to other processes running on the server. I recently had access to the server and figured out that at a regular basis, an process from an other application is taking a lot of load, which decreases a lot the performances of the tasking manager.

For the users.json request I agree that we should find a better way to handle it. And request the server only when we need it, ie. when a user want to mention someone. I'll open a specific issue for this.

For the check_interval I also agree that there a problem. When the server behaves correctly, the request itself doesn't take more than a few milliseconds. The load on the server is very low with such a query. However, the script on the client side should behave differently when a request idles, I agree. I'll open an issue for this as well.

bgirardot commented 7 years ago

@CloCkWeRX This was really helpful thank you for looking. Hopefully as @pgiraud said, if we can move some of the other high cpu/disk accessing processes off the the tasking manger server things will improve. But I also think we might just be finding some of the slower spots in the software now that we are up to so many users and so many projects in the database. Not sure really, but teasing out some of these performance issues might be more possible now that we have a lot more data in the database to work with.

Piskvor commented 7 years ago

As for the users.json : the issue could be mitigated by a simple change. If it would be sufficient to have a somewhat-stale list of users (ranging from a few minutes to tens of minutes? Note the list seems to be only for at-suggestions for non-privileged users. 10-minute old list might be a reasonable compromise, IMNSHO), sending the header Cache-Control: private, max-age=600 with the response would mean that browsers would request a fresh list once per 10 minutes, not once per each tile select.

See my pull request for the simplest possible implementation: https://github.com/hotosm/osm-tasking-manager2/pull/926

CloCkWeRX commented 7 years ago

I don't know how you feel about it or if the instrumentation for python is much good; but newrelic in many rails apps is a really good 'drop in' monitoring tool - easily highlights slow queries, or slowest requests by volume. Additionally does basic server metrics (load/cpu/disk/etc) with a few other packages. It may be worth looking at something like that for the TM2 production deploy.

Piskvor commented 7 years ago

In the past week, the app was much faster. Today, it keeps timing out (the backend process doesn't return a response before nginx timeouts) - is there some heavy crunching going on at the server?

pgiraud commented 7 years ago

The tasking manager itself is not responsible for its slowness. Some others processes are talking a lot of load on the machine from time to time. When the tasking manager is slow then the main website is slow as well.

Piskvor commented 7 years ago

Whoa. There's been some upgrade overnight - the app just flies now!

pgiraud commented 7 years ago

No real upgrade actually. Simply isolated on its own server. Not disturbed or slowed down by other applications. Closing this issue.