Slow performance of web interface for big data (>1k)

open-research / sumatra

BSD 2-Clause "Simplified" License

127 stars 48 forks source link

Slow performance of web interface for big data (>1k) #308

Closed babsey closed 8 years ago

babsey commented 8 years ago

For one project I have more than 1000 recordings and it takes approx. 50 sec to put it on website. That is awefully slow!!! I assume that the source of that issue is too much database queries a page loading.

I tested a possible solution. I use serverside-processing for jquery-datatable, it means that not all recordings will rendered for a page of the table. Indeed, testing on 70 recordings the time for loading page reduces from approx 2 sec to 80 msec!!! It reloads after each paging, searching, ordering

After working on that solution, filtering and ordering still works in serverside-processing mode.

What do you think?

maxalbert commented 8 years ago

I have a similar problem with slow performance in a project with many records (400+), so I would very much welcome improvements in this regard. However, I am not very familiar with the internals of the record store or the web interface so can't comment on the implementation side of things. Hopefully @apdavison can provide his opinion.

Do I understand correctly that your solution would only improve the performance of the web interface? I'm asking because I mostly query the database programmatically to analyse my simulations, which is also fairly slow (a simple smt list in the terminal takes ca. 60 seconds to complete), so it would be great to improve performance in this regard, too.

babsey commented 8 years ago

You understand it correctly. My request is only addressing to web interface.

But I also checked smt list command, it is also slow because of the project.find_records. In this method all records convert from django to sumatra type. It might be a good idea to filter records before converting. e.g. the last 20 recs, by date, by specific script file or by parameter values.

apdavison commented 8 years ago

:+1: for using the server-side processing option in datatables (I presume you are talking about this: http://www.datatables.net/manual/server-side)

For smt list, there is a quick way to half the time, and that is by avoiding the double call to project.format_records() (once for the bash completion file, once for the output), for example by appending the label to the completion file after smt run, rather than regenerating the entire file each time smt list is run.

Further performance improvements would be difficult; I agree with Sebastian that a better first approach would be to provide better filtering options for smt list (and perhaps also add an smt show command).

maxalbert commented 8 years ago

Great, thanks for the comments. The smt list command was actually just a quick example, personally I'm more interested in improving the performance of project.find_records() (which is called under the hood by smt list) because I typically use the Python API directly from my analysis scripts. It seems like Sumatra passes responsibility over to django pretty quickly so that's probably where we need to start. @babsey's suggestion of filtering before conversion sounds good, but it won't help if you are interested in a list of all records. I guess it would be good to do some profiling to see what exactly is causing the performance hit.

apdavison commented 8 years ago

I have opened a new issue, #310, for the performance issues with smt list and project.find_records(), and renamed this issue to indicate it is specific to the web interface.

apdavison commented 8 years ago

I've also opened #309 for adding filtering options.

babsey commented 8 years ago

I am working on solving this issue and I am able improve the performance the website using serverside processing. Before I carry on editing the codes, I would like to ask you: Either I can make this improvement permanent or optional which can be activated by an option in smtweb. What do you think?