mckinsey / vizro

Vizro is a toolkit for creating modular data visualization applications.
https://vizro.readthedocs.io/en/stable/
Apache License 2.0
2.69k stars 142 forks source link

How to speed up loading of large CSV files (35,000+ rows) in Vizro with treemap and AG Grid table? #862

Open BalaNagendraReddy opened 2 days ago

BalaNagendraReddy commented 2 days ago

Question

I'm working with a CSV file that contains around 35,000 rows, and when I try to load it into Vizro, it takes a long time to render the page, especially when using both the treemap and AG Grid table components. Are there any strategies or best practices to speed up the loading time for large datasets like this? Any suggestions for optimizing performance would be greatly appreciated!

Code/Examples

No response

Which package?

vizro

Code of Conduct

petar-qb commented 2 days ago

Hey @BalaNagendraReddy and thanks for the great question! The Vizro team plans to document good practices for working with big data soon, and until then, here are some suggestions on what can you do to improve your page rendering time:

  1. Dynamic data cache - If you use dynamic data loading in your app, introducing caching mechanism should speedup the rendering time. More about Vizro dynamic data loading and the cache mechanism you can find here -> https://vizro.readthedocs.io/en/stable/pages/user-guides/data/#dynamic-data
  2. Parametrize data loading - If you use dynamic data loading, you can load only a single chunk of the data at the time where the certain chunk of the loaded data could be selected with data_frame parameter. More about parametrize data loading you can find here -> https://vizro.readthedocs.io/en/stable/pages/user-guides/data/#parametrize-data-loading
  3. Gunicorn web server gateway - You can increase the number of server forked workers with the gunicorn. Here's where you can find more about running the Vizro app with multiple workers -> https://vizro.readthedocs.io/en/stable/pages/user-guides/run/#gunicorn
  4. Single select filters - Using single-select Vizro filter selectors gives better performances compared to the multi-select Vizro filter selectors. Here is a list of predefined single-select Vizro selectors you can use:
    • vm.Dropdown (with multi=False),
    • vm.RadioItems,
    • vm.Slider and
    • vm.DatePicker (with range=False)
  5. If there are some data_frame rows or columns that are irrelevant and never presented in you charts, you can pre-filter those rows/columns before they are loaded into your charts.
  6. AgGrid pagination - Some performance also could be gained by using AgGrid pagination. More about it -> https://vizro.readthedocs.io/en/stable/pages/user-guides/table/#enable-pagination. It's important to know that this simple pagination flag enables only the client-side pagination. There's an advanced option to utilise the server-side pagination that improves the ag-grid rendering time even more. More about the server-side pagination -> https://dash.plotly.com/dash-ag-grid/infinite-scroll
BalaNagendraReddy commented 8 hours ago

Hi @petar-qb ,

Thanks for sharing the valuable information!

In my case, the issue isn't related to reading the CSV file or creating the treemap, but rather with the time it takes to load the treemap and grid table in the browser.

Are there any strategies or optimizations I can implement to speed up the rendering or loading of these components in the browser? Specifically, I’m looking for approaches that can improve the overall responsiveness and reduce load times, especially when dealing with larger datasets.

Any advice or pointers would be greatly appreciated!

Thanks again for your help!

petar-qb commented 6 hours ago

Hi @BalaNagendraReddy,

Could you let us know which version of Vizro you're using? The latest release is vizro==0.1.26, and we recommend upgrading to this version if you haven’t yet. It's weird that you have problems with rendering 35k rows as this doesn't sound like too big data for handling.

If loading large amount of data on the server and creation of treemap and ag_grid on the server is fast operation (so, if response time meets you expectations), but the rendering time on the browser is slow, then I recommend (2. 4. and 5. rules from the previous comment). These rules will ensure that not entire data is presented in the treemap and ag_grid. Presenting smaller amount of data means faster rendering time.

I also recommend you to implement the 6. suggestion from the last comment I posted. Ag-Grid pagination (that could be configured super easily by setting a single property) should speedup the rendering time too.

BalaNagendraReddy commented 4 hours ago

Hi @petar-qb ,

Thanks for the prompt response. Currently, iam using vizro==0.1.21.

Will check above points to speed up the loading.