Open EmilianoJordan opened 4 years ago
cc @TomAugspurger
Generally speaking, we're happy to accept PRs for subclassing things as long as they don't affect the typical user's experience. Switching things like DataFrame
to use ._constructor
is a good example.
Should there be more documentation around subclassing?
Sure. We have a page dedicated to subclassing.
should there be a constructor kwarg added to functions like read_sql and read_csv?
IMO, no. Most users wouldn't ever need it so I wouldn't want it in the signature.
Does to_hdf not working with subclassing need to be fixed?
I'm not familiar with the issue.
I was looking into df.astype not respecting subclassed objects
I looks like it still does not work when astype
is provided as a dict: https://github.com/pandas-dev/pandas/issues/40810
I'd like to clarify the behavior in my original post for this issue with a code example:
This is intended to show that either the 'left' value or the first value (in the case of concat) is respected in the case of subclassing. In my opinion the better design decision would be to make these operations commutative. This could be accomplished by only returning a subclass when the objects are homogeneous in type and a pandas object should be returned in all other cases.
No matter what decision is made here I believe this should be documented in the subclassing documentation.
import pandas as pd
class SubclassedSeries(pd.Series):
@property
def _constructor(self):
return SubclassedSeries
@property
def _constructor_expanddim(self):
return SubclassedDataFrame
class SubclassedDataFrame(pd.DataFrame):
@property
def _constructor(self):
return SubclassedDataFrame
@property
def _constructor_sliced(self):
return SubclassedSeries
subclassed_series = SubclassedSeries(range(10,20), name='test')
s = pd.Series(range(10,20), name='test')
# pandas.merge respects lef class type
t = pd.merge(subclassed_series, s)
assert isinstance(t, SubclassedDataFrame)
t = pd.merge(s, subclassed_series)
assert not isinstance(t, SubclassedDataFrame)
# pandas concat behavior
t = pd.concat([subclassed_series, s])
assert isinstance(t, SubclassedSeries)
t = pd.concat([s,subclassed_series])
assert not isinstance(t, SubclassedSeries)
t = pd.concat([subclassed_series, s], axis=1)
assert isinstance(t, SubclassedDataFrame)
t = pd.concat([s,subclassed_series], axis=1)
assert not isinstance(t, SubclassedDataFrame)
I was looking into
df.astype
not respecting subclassed objects and noticed the issue had been resolved by modifyingpd.concat
to respect subclassed objects using the following code:Line 464 PR #29627
Line 477 PR #33884 (Set to be released on 1.1.0)
This assumes homogeneous object types, or at the very least that the first object is representative. My assumption was that functions on the pandas namespace did not respect subclassing and only methods on the objects did. Leaving developers to implement their own special use cases for subclassing. This seemed like a good delimitation to me, both logically and programmatically. But, now diving deeper into functions I don't regularly use I see I was mistaken as
merge
andmerge_ordered
both respect theleft
arg's subclassing.With this in mind, I have some questions.
concat
,merge
,merge_ordered
etc handle subclassing.constructor
kwarg added to functions likeread_sql
andread_csv
?to_hdf
not working with subclassing need to be fixed?