Open noloerino opened 6 days ago
See pandas discussion: https://github.com/pandas-dev/pandas/issues/52166
Though attrs
is not fully mature, it seems to be used pretty frequently in downstream libraries to track metadata for use cases like plot generation, and the feature seems to be here to stay.
pandas supports propagation of attrs
through __finalize__
, which Modin vacuously defaults to pandas. I think the least intrusive approach for us would be to keep attrs
as a non-distributed, regular Python dict and track attrs
at the query compiler level. However, it may be better to track attrs
through __finalize__
like native pandas does, but this would require changing almost every frontend method to call this before returning.
Modin version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest released version of Modin.
[X] I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
DataFrame.attrs lets users specify metadata on frames that are deep-copied to new dataframes when operations are performed. In Modin,
attrs
defaults to pandas, but this means that any writes to it are not reflected in the original frame, much less any other operations.When a write to
attrs
is attempted, it only modifies theattrs
field of the nativepandas.DataFrame
that's produced withinDataFrame._default_to_pandas
, and themodin.pandas.DataFrame
has no knowledge of this operation.Expected Behavior
Writes to
attrs
are reflected in subsequent read operations, and propagated across operations.Error Logs
Installed Versions