Open MarcSkovMadsen opened 1 week ago
Hey @MarcSkovMadsen , thank for reporting. We currently provide a is_into_dataframe
function to validate if the object is convertible into a eager narwhals DataFrame
:
from narwhals.dependencies import is_into_dataframe
import pandas as pd
import polars as pl
import numpy as np
df_pd = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df_pl = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
np_arr = np.array([[1, 4], [2, 5], [3, 6]])
is_into_dataframe(df_pd)
True
is_into_dataframe(df_pl)
True
is_into_dataframe(np_arr)
False
As of now we don't have an equivalent for lazy dataframes though. You could combine other functionalities, such as is_dask_dataframe
and is_polars_lazyframe
Thanks that should be ok for me for now.
I was thinking that you needed to change your typing implementation to something like below:
I read the tutorial and assumed I would learn what was necessary from that to get started. But I should also learn to study the API docs. Just did not cross my mind 😄 Thx.
Any feedback on how to improve the documentation is welcomed! We recently did some development to move forward with the integration with plotly and let the docs a little behind 🙈
In addition to pandas and polars dataframe we also have a database SELECT query that I thought could have lightweight interchange support. See https://github.com/narwhals-dev/narwhals/issues/1289 for more context.
From
I would expect the below to work:
from narwhals.dependencies import is_into_dataframe
class DatabaseConnector():
def __dataframe__(self):
raise NotImplementedError()
connector = DatabaseConnector()
assert is_into_dataframe(connector)
But it does not.
script.py:9: in <module>
assert is_into_dataframe(connector)
E assert False
E + where False = <function is_into_dataframe at 0x7f0093107ce0>(<script.DatabaseConnector object at 0x7f00930bcf10>)
I think the best workaround for now then would be something along the following lines:
if isinstance(
df := nw.from_native(df_native, eager_or_interchange_only=True, pass_through=True),
nw.DataFrame,
):
# work with narwhals DataFrame, using eager_or_interchange_only=True enables interchange support
elif isinstance(
df := nw.from_native(df_native, pass_through=True),
nw.LazyFrame,
):
# work with narwhals LazyFrame
else:
# df_native is unchanged
With pass_through=True
, the original object will be returned if unable to convert to a Narwhals DataFrame/LazyFrame, making the first two conditions False.
Thanks for the request!
Agree with Francesco's suggestion, we do something similar in the Plotly PR
Just one clarification: hopefully, "interchange" level will just be temporary, and in 2025 we can end up with:
And we can end up with a well-defined spec which different dataframe / database libraries can implement so that we're not the single point of failure. It's very difficult to do that properly without ending up as in https://xkcd.com/927/, but long-term, it might be something we can aim for Short-term, we're more focused on concrete use cases - we ain't gonna have any hope of standardising anything if we can't first get adoption
Thanks for all the help. I managed to get a first version of support for Narwhals in https://github.com/panel-extensions/panel-graphic-walker/pull/22. I hope this will inspire for support in param, Panel, hvPlot, HoloViews and Bokeh. At least the concept is in that PR.
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
Please describe the purpose of the new feature or describe the problem to solve.
I would like to add support for general DataFrames in the HoloViz ecosystem.
The starting point is that most things are "stored" as parameters of a
param.Parameterized
class:When
MyClass
is instantiated thevalue
is validated. I was hoping that narwhals would provide functionality to validate if the supplied value is valid. For example if it is anIntoDataFrame
.Suggest a solution if possible.
I was hoping it was possible to do something like
isinstance(value, IntoDataFrame)
:But when running
I get
If you have tried alternatives, please describe them below.
I can of course write a validation function myself. But I was hoping that not have to maintain functionality to handle different dataframe libraries.
Additional information that may help us understand your needs.
No response