outbreak-info / outbreak.info

During outbreaks of emerging diseases such as COVID-19, efficiently collecting, sharing, and integrating data is critical to scientific research. outbreak.info is a resource to aggregate all this information into a single location.
https://outbreak.info/
GNU General Public License v3.0
33 stars 13 forks source link

[Performance] Page load takes 7 seconds, 200+ individual requests for one website #364

Open corneliusroemer opened 3 years ago

corneliusroemer commented 3 years ago

The page has always been slow to load. I used to think that this was due to the API being slow, but looking into it, it actually seems to be more that static resources are blocked for a very long time.

I don't know what the root cause is, but this should probably be fixed/optimised.

Why are so many inidividual .js and .css requests sent? Should they not be put into one file? It could be that the server is overwhelmed with browsers trying to make hundreds of parallel connections. To load just the basic variant overview page, 290 individual requests need to be served. That's a lot! You're basically dosing your own server with hundreds of tiny requests.

image

gkarthik commented 3 years ago

Hey, the chunking allows vuejs to lazy load modules. Loading the code in 1 file would take longer and lead to a lot of redundant code being requested from the server.

  1. Could you specify which page this is? Is it https://outbreak.info/situation-reports?
  2. The load time for that page is under 2 seconds on my end. And in the screenshot you posted the load time seems to be 3.24 seconds. its unclear to me where you get the 7 seconds from.
corneliusroemer commented 3 years ago

Yes, it was /situations-reports.

It may have been the VPN I was connected through that made it hard to get those 300 requests through. But it definitely took 8.8s to complete loading the page. And the connection is generally fast.

Loading is even slower for specific variant pages. The header/footer appears after around 1-2 second, then a loading wheel shows for 2-3 seconds and then the actual content of interest appears.

Sometimes loading takes even longer, I've measured up to 15 seconds. Do you cache API requests at all? The most common requests should be identical: sequence counts for world, US, etc, main lineages... Data changes only once a day, you only need to purge cache once day.

Overall, I just have the feeling that things could be a lot faster, but I don't know exactly what the issue is.

flaneuse commented 3 years ago

There's a lot of performance optimization that could be done that we haven't invested in, focusing on adding additional features instead at the moment. But I agree that things are slower than is ideal, and if you want to poke around, that'd be great.

On the front-end, API requests should be cached for 1 hour. We could probably make that a bit more smart; the issue is that we don't want to show old data when the API has been updated for that day.

@juliamullen has been looking into also adding backend caching, focusing on the most frequent and problematic calls (for instance, on the Location page, getting all lineages for a location over time is an inherently slow process). Nothing implemented yet though.

corneliusroemer commented 3 years ago

Good! I'd mostly look at backend caching, since there are a few requests that are served to almost everyone.

Frontend caching isn't that important since it's really the computation that takes time and resources, the amount of data transferred is actually quite small.

Don't you use flask for the API? Maybe I'm a bit optimistic but shouldn't it be almost drop-in?

On Tue, May 25, 2021, 22:53 Laura Hughes @.***> wrote:

There's a lot of performance optimization that could be done that we haven't invested in, focusing on adding additional features instead at the moment. But I agree that things are slower than is ideal, and if you want to poke around, that'd be great.

On the front-end, API requests should be cached for 1 hour. We could probably make that a bit more smart; the issue is that we don't want to show old data when the API has been updated for that day.

@juliamullen https://github.com/juliamullen has been looking into also adding backend caching, focusing on the most frequent and problematic calls (for instance, on the Location page, getting all lineages for a location over time is an inherently slow process). Nothing implemented yet though.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/outbreak-info/outbreak.info/issues/364#issuecomment-848256849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF77AQKEESUFVZ4DMWUO27LTPQE5PANCNFSM45PHWTWA .

flaneuse commented 3 years ago

Nope, they're custom Tornado handlers built off the Biothings Suite and served via nginx.

gkarthik commented 3 years ago

@corneliusroemer the number of requests is less consequential. The main limiting factor here is the API requests. Some of the more complex aggregations take the longest and these can be cached in 2 locations: ElasticSearch or Tornado. Julia is looking into this much closer.

Client side caching is actually a very important component. Reduces redundant requests and hence, saves on compute. The headers on the requests can also be customized to clear cache in a more fine grained manner but we can improve on this iteratively based on results of backend caching.