Open sfc-gh-yzou opened 1 month ago
cc @devin-petersohn @noloerino Do you think we can potentially improve this user facing behavior?
I would slightly prefer changing data
to accept BaseQueryCompiler
and performing an isinstance(data, BaseQueryCompiler)
check, since we already do isinstance checks to see if the data is an array or other modin frontend object. I'd also be fine with adding a DataFrame/Series._from_query_compiler
static method for internal use like you suggested.
@noloerino we can let data to accept the query compiler interface, but if we do that, let's make sure the QueryCompiler doesn't occur in any of the user documentation, so that we can keep that completely hidden from the user.
Today, the constructor for dataframe looks like the following
which contains a parameter query_compiler, and occurs in the generated documentation. However, we don't really want user to use this parameter, it is just for our internal construction usage.
In pandas, they allow construction of dataframe/series directly from BlockManager class, and the BlockManager is a property on NDFrame class https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py#L257. Pandas provides an internal method _from_mgr https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py#L309 to allow construction directly from BlockManager. In the frontend, pandas actually handles when data is a BlockManager directly, but it is not documented in the API.
In modin, maybe we can do something similar, where we can push the query_compiler construction to the BasePandasDataset, and provide an internal constructer _from_query_compiler to allow direct creation of the class from query compiler. In that way, we will also be consistent with the pandas constructor definition.