Closed pkohlmann closed 2 years ago
This is fixed on master. May need tests
MultiIndex([('X', '1'),
('X', '2'),
('X', '2')],
)
Index of df2 has duplicates: True
Index of df2 is unique: False
Hi, @phofl can you please check if the fix works in versions 1.3.4
& 1.3.5
? I get the same results as you in 1.3.3
, but the bug persists in latter versions.
Also, I would like to pick it up as my first issue if it's okay
take
removing milestone
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
DataFrame df1 has an index with duplicates, i.e. the index of df1 is not unique. When df1 is used as the only DataFrame in pandas's conat(), i.e. the concat() is applied to a single DataFrame, the resulting DataFrame has a MultiiIndex which has duplicates but the metadata of this MultiIndex say it is unique and does not have duplicates. When concat() is applied to a list of two or more DataFrames the resulting MultiIndex is correct. The scenario with only one DataFrame takes a short-cut for creating the MultiIndex which apparently has a flaw. Deploying concat() on a single dataframe is a valid scenario which ,for example, can occur in a GroupBy.apply(). The inconsistency in the MultiIndex makes other functionality like Index.drop_duplicates() fail because they rely on the metadata of MultiIndex.
Expected Behavior
Concat() returns a DataFrame with a consistent MultiIndex when deployed to a single DataFrame with duplicates.
Installed Versions