Open colearendt opened 4 years ago
Interesting. I like the idea of limiting get_content
to things that have usage info for the time requested. Although, if all of the content is being used than it will still have that problem.
Is there a listing of the filter options for the applications
endpoint. I know I can filter by app name but I am not seeing how to filter by the guid which is what I would need to limit it based on usage.
One other thing that may provide a major speedup is simply increasing the page_size
argument. I remember there being a max you could set page size to for one of the end points (may all?) but I am not seeing any thing in the API docs or connectapi
R6 docs about it. Benchmarks shown below look like that could result in almost a 2x speed up for my system (only 4 pages of content on my system though)
tmp <- connectapi::connect(host = Sys.getenv("CONNECT_SERVER"), api_key = Sys.getenv("CONNECT_API_KEY"))
microbenchmark::microbenchmark(
w_page_size = connectapi::get_content(tmp, limit = Inf, page_size = 1e4),
wo_page_size = connectapi::get_content(tmp, limit = Inf)
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> w_page_size 276.8448 289.3661 301.1540 297.2429 304.8684 481.6506 100
#> wo_page_size 510.5587 534.9695 552.8163 546.6944 562.0564 756.6520 100
@colearendt Can you run the benchmark in the comment above on your Connect server and let me know how much of a speed up it provides when you get a change? If it is significant than I will implement the change in the app.
I can also restrict the content query to only apps with usage data if you have an example of how the content query can be filtered by application GUID.
Apologies for the delay here - not very precise, but not suggestive of page size making a whole lot of difference, unfortunately. Poking around at other approaches
hm <- get_content(client, limit = Inf, page_size = 10000)
downloading page 2 (0.06/sec) 00:00:33
> hm <- get_content(client, limit = Inf, page_size = 1000)
downloading page 4 (0.13/sec) 00:00:29
> hm <- get_content(client, limit = Inf, page_size = 500)
downloading page 6 (0.18/sec) 00:00:32
> hm <- get_content(client, limit = Inf, page_size = 100)
downloading page 24 (0.71/sec) 00:00:33
> hm <- get_content(client, limit = Inf, page_size = 50)
downloading page 47 (1.3/sec) 00:00:34
Update. I left microbenchmark
running for a while and got that it does help some, but not enough to be super meaningful:
Unit: seconds
expr min lq mean median uq max
w_page_size 29.18135 32.65261 33.99538 33.98875 35.04150 43.04888
wo_page_size 42.14709 45.26064 47.24185 47.07872 48.64099 53.92323
neval
100
100
Interesting... so two thoughts:
In addition, I could make the loading screen more informative using the waiter
package if I access which page is being loaded. Not sure if that would make it that much better but could be an option.
Yes, I am thinking about a smaller Connect instance. I filtered down the content a good bit and still ran into some performance issues in the frontend performance / etc. b/c our server gets a lot of traffic as well.
90+ pages of content is definitely not normal 😄Our server gets more deployment traffic than most, I would expect, and has been around longer than most as well. Hoping a smaller Connect instance will be a better fit.
Interesting... I wonder if it has anything to do with the fact that I set all the outputs to render even if they are not active (so that when a user switches to the admin page it is ready to go).
Is there anything in particular that appears to be slowing it down? or just the overall processing of that much data? Just curiously, when looking at the default time range, how much usage data (rough number for rows in shiny and static) are you looking at on that server?
Yeah it seemed like lots of rendering, as well as just a bunch of frontend widgetry. The timeseries dealio (which I know I recommended 😂 ) seemed to particularly struggle.
For the last week of data, there are ~ 1000 shiny entries, and ~ 2000 static.
That is a lot of traffic! One thing that may help is letting the user pick which widget(s) they want? I think it would take some hacking at the ui layout but I think it could be done. I don't know if that defeats the purpose of the dashboard though.
Another idea would be to separate the heavier widgets from the more basic ones so that the basic ones (which hopefully don't take as long to load) can load and then they have to go to a different page to see "Detailed Usage Info"
I can also look into whether there are other faster versions of something like timevis to try and implement the timeline feature faster.
I wouldn't say this is a huge priority - I should be able to get this up on a lower traffic server for the contest, at least 😄 I think the main pain w/ the timevis configuration is that presently it is trying to paint e.g. all 3000 rows into the UI, it is not doing any aggregation / etc. That shouldn't be a problem on a lower traffic server though.
Admittedly, our demo server has a bunch of content, but it took ~ 1 minute for the application to start while the API paged through ~ 90 pages of content.
I don't have any great thoughts for how to improve here, unfortunately...
get_content()
(or only retrieve items from the usage data)The third bullet seems most reasonable, to be honest. I'll think a bit more about this!