Open juanitorduz opened 3 months ago
Hey, just wanted to stop by and say - thanks for your interest! Feel free to book some time on https://calendly.com/marcogorelli if you'd like to chat about how Narwhals could help PyFixest
Hi both (@MarcoGorelli and @juanitorduz) - I've now thought about it for 15 minutes and I think narwhals
might be a great solution for PyFixest
! Thanks for offering to chat @MarcoGorelli , I'll book an appointment =)
Just some background on pyfixest and how it works with Data Frames: most of the data manipulation happens via the formulaic library, which requires an input pd.DataFrame
. I.e. a usual flow looks like this:
%load_ext autoreload
%autoreload 2
import polars as pl
import pandas as pd
import pyfixest as pf
from formulaic import model_matrix
import narwhals as nw
data = pl.DataFrame(pf.get_data())
def feols(data):
if isinstance(data, pl.DataFrame):
data = data.to_pandas()
# model_matrix requires a pandas DataFrame and returns a pandas DataFrame
Y, X = model_matrix("Y ~ X1", data = data, output = "pandas")
# some more pandas manipulations
Y.dropna(inplace = True)
X.dropna(inplace = True)
return Y.to_numpy(), X.to_numpy()
Via narwhals
, it could look as
def feols_nw(data, use_polars = False):
data = nw.from_native(data)
# model_matrix requires a pandas DataFrame and returns a pandas DataFrame
Y, X = model_matrix("Y ~ X1", data = data.to_pandas(), output = "pandas")
if use_polars:
# another copy? potentially costly?
Y = nw.from_native(Y)
X = nw.from_native(X)
# some more pandas manipulations
Y.dropna(inplace = True)
X.dropna(inplace = True)
return Y.to_numpy(), X.to_numpy()
Hey! Thanks for your explanation - if formulaic
requires specifically pandas input/output, and then that might be a good candidate for Narwhalification :) I'll take a look, thanks!
# another copy? potentially costly? Y = nw.from_native(Y)
Just to clarify, from_native
just wraps a dataframe in a narwhals.DataFrame
- it's a virtually free operation, only takes a few microseconds, and doesn't do any copies - Narwhals only translates syntax
Naive question: It seems formulaic supports pyarrow.Table
. Could this be a shortcut for Polars integration? https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.to_arrow.html
totally!
Use https://github.com/narwhals-dev/narwhals to support pandas and polars!
This seems to be a very cool alternative to support various backends. See for example https://github.com/koaning/scikit-lego/pull/671