Open batterseapower opened 8 months ago
Thanks for the report. Agreed this shouldn't raise, but I'm not certain about preserving names. pd.Index([2, 4, 3], name="a").factorize()
also does not preserve the name, and doing so might have some wide ranging complications (haven't checked). Further investigations welcome - in particular, if we do preserve names, what is the impact on the test suite?
I also think names should be preserved in the regular Index case. It does break backwards compatibility to do this because existing code may be relying on them not being preserved, but leaving this aside it does seem very clear to me that dropping the names is surprising behaviour.
but leaving this aside it does seem very clear to me that dropping the names is surprising behaviour.
No disagreement here offhand, but this could have wide ranging implications and the impact needs to be investigated.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The first example should succeed, but actually it fails:
Probably because of the same underlying issue,
MultiIndex.factorize
always loses thenames
of the original index. It should preserve the original names instead.Expected Behavior
First
factorize()
should return(np.array([], dtype=np.intp), empty_ix)
Second example should return
['a', 'i']
instead of[None, None]
Installed Versions