Open dechamps opened 4 years ago
which is actively relying on tuples being treated as scalars and stored as single objects
If you have a viable way to avoid this in your code, I'd encourage you to use it. Regardless of how this issue is addressed, tuples-as-scalars is fragile
If you have a viable way to avoid this in your code, I'd encourage you to use it. Regardless of how this issue is addressed, tuples-as-scalars is fragile
Yep. Well at least this issue forced me to clean up my code :) I'm now wrapping the value inside a fully opaque container object.
moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline
moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline
moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline
According to the docstring, I would say that the behaviour of 1.0.5 was correct, and this is a regression.
@jbrockmendel would you have time to look into it?
According to the docstring
just to be clear, in the DataFrame.apply
docstring https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html, the description for the result_type
parameter is...
The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.
The return type for user function in the OP is a tuple (considered list-like) so we expect a Series of those.
This issue also occurs with a very list-like list where we also expect the default result_type behaviour to be to reduce.
>>> pd.__version__
'1.3.0.dev0+100.g54682234e3'
>>>
>>> df = pd.DataFrame([["orig1", "orig2"]])
>>>
>>> df.apply(func=lambda col: ("new1", "new2"), result_type="reduce")
0 (new1, new2)
1 (new1, new2)
dtype: object
>>>
>>> df.apply(func=lambda col: ("new1", "new2"))
0 1
0 new1 new1
1 new2 new2
>>>
>>> df.apply(func=lambda col: ["new1", "new2"], result_type="reduce")
0 [new1, new2]
1 [new1, new2]
dtype: object
>>>
>>> df.apply(func=lambda col: ["new1", "new2"])
0 1
0 new1 new1
1 new2 new2
>>>
I would say that the behaviour of 1.0.5 was correct, and this is a regression.
agreed.
@jbrockmendel would you have time to look into it?
ping
Possibly related: #35517, #34909 @simonjayhawkins @jbrockmendel
can confirm, first bad commit: [91802a9ae400830f9eaadd395f6a9b40cdd92ee5] PERF: avoid creating many Series in apply_standard (#34909)
Aside from reverting #34909, the solution that comes to mind is calling the function on the first row in wrap_results_for_axis and seeing if we get a tuple. That runs into other problems with non-univalent or mutating functions.
removing milestone
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Output of Pandas 1.0.5
Output of Pandas 1.1.0
It is not clear to me if this behaviour change is intended or not. I couldn't find anything obvious in the release notes.
Possibly related: #35517, #34909 @simonjayhawkins @jbrockmendel
This broke my code, which is actively relying on tuples being treated as scalars and stored as single objects (instead of being laid across the dataframe).
Output of
pd.show_versions()