modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.91k stars 653 forks source link

PERF: Add explicit query compiler method for len/shape checks #7397

Closed noloerino closed 2 months ago

noloerino commented 2 months ago

Is your feature request related to a problem? Please describe. Currently, calling len(pd.DataFrame(...)) will materialize the frame's index and compute its length.

Some storage formats (including pandas, via the PandasDataFrame object) have more efficient ways, or built-in caching mechanisms, for computing the dimensions of a frame. Adding an explicit query compiler method (get_axis_len(axis: [0, 1]) -> int) would let us take advantage of this. Accordingly, calls to len(self.index) in frontend code should be replaced with len(self), and calls to len(self.columns) with self._query_compiler.get_axis_length(1) to avoid unnecessary materialization.