posit-dev / positron

Positron, a next-generation data science IDE
Other
1.42k stars 42 forks source link

Improve performance of opening data explorer for a large pandas data frame #2851

Open wesm opened 2 months ago

wesm commented 2 months ago

Positron Version:

main branch as of April 22, 2024

Currently the data explorer does not display anything until the initial set of schema, data, and column null count requests go through. For a 33M row data frame, for example, this results in a delay of several seconds while these things compute

Screencast from 2024-04-22 17-30-28.webm

A few things to consider:

wesm commented 2 months ago

As shown in #2881, recomputing the null count profile statistics also impedes updating the waffle after applying a filter

wesm commented 1 month ago

This issue shouldn't require any backend changes. We need to compute the null counts asynchronously rather than block the initial loading of the waffle on the null counts request returning