pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.74k stars 17.95k forks source link

BUG: index map leading to ValueError: Reindexing only valid with uniquely valued Index objects #40115

Open bravegag opened 3 years ago

bravegag commented 3 years ago

I am using Pandas 1.1.5 and get to the error ValueError: Reindexing only valid with uniquely valued Index objects.

Here is a self contained MRE:

import pandas as pd
from pandas import Timestamp
my_dict = my_dict = {('a',): 'x1', ('b',): 'x1', ('c', 'c1'): 'x1', ('c', 'c2'): 'x1', ('c', 'c3'): 'x1', ('c', 'c4'): 'x1', ('c', 'c5'): 'x1', ('c', 'c6'): 'x1', ('c', 'c7'): 'x1', ('c', 'c8'): 'x1', ('c', 'c9'): 'x1', ('c', 'c10'): 'x1', ('c', 'c11'): 'x1', ('c', 'c12'): 'x1', ('c', 'c13'): 'x1', ('c', 'c14'): 'x1', ('c', 'c15'): 'x1', ('c', 'c16'): 'x1', ('c', 'c17'): 'x1', ('c', 'c18'): 'x1', ('c', 'c19'): 'x1', ('c', 'c20'): 'x1', ('c', 'c21'): 'x1', ('c', 'c22'): 'x2', ('c', 'c23'): 'x2', ('c', 'c24'): 'x2', ('c', 'c25'): 'x2', ('c', 'c26'): 'x2', ('c', 'c27'): 'x2', ('c', 'c28'): 'x2', ('c', 'c29'): 'x2', ('c', 'c30'): 'x2', ('c', 'c31'): 'x2', ('c', 'c32'): 'x2', ('c', 'c33'): 'x2', ('c', 'c34'): 'x2', ('c', 'c35'): 'x2', ('c', 'c36'): 'x2', ('c', 'c37'): 'x2', ('c', 'c38'): 'x2', ('c', 'c39'): 'x2', ('c', 'c40'): 'x2', ('c', 'c41'): 'x2', ('c', 'c42'): 'x2', ('c', 'c43'): 'x2', ('c', 'c44'): 'x2', ('c', 'c45'): 'x3', ('c', 'c46'): 'x3', ('c', 'c47'): 'x3', ('c', 'c48'): 'x3', ('c', 'c49'): 'x3', ('c', 'c50'): 'x3', ('c', 'c51'): 'x3', ('c', 'c52'): 'x3', ('c', 'c53'): 'x3', ('c', 'c54'): 'x3', ('c', 'c55'): 'x3', ('c', 'c56'): 'x3', ('c', 'c57'): 'x3', ('c', 'c58'): 'x4', ('c', 'c59'): 'x4', ('c', 'c60'): 'x4', ('c', 'c61'): 'x4', ('c', 'c62'): 'x4', ('c', 'c63'): 'x4', ('c', 'c64'): 'x4', ('c', 'c65'): 'x4', ('c', 'c66'): 'x4', ('c', 'c67'): 'x4', ('c', 'c68'): 'x3', ('c', 'c69'): 'x3', ('c', 'c70'): 'x3', ('d', 'd1'): 'x1', ('d', 'd2'): 'x1', ('d', 'd3'): 'x2', ('d', 'd4'): 'x2', ('d', 'd5'): 'x2', ('d', 'd6'): 'x2', ('d', 'd7'): 'x4', ('d', 'd8'): 'x4', ('d', 'd9'): 'x4', ('d', 'd10'): 'x4', ('f',): 'x1', ('g',): 'x3'}
print(my_dict)
df = pd.DataFrame.from_dict({'date': {('a', 'a1'): Timestamp('2021-02-28'), ('a', 'a2'): Timestamp('2021-02-28'), ('a', 'a3'): Timestamp('2021-02-28'), ('a', 'a4'): Timestamp('2021-02-28'), ('a', 'a5'): Timestamp('2021-02-28'), ('a', 'a6'): Timestamp('2021-02-28'), ('a', 'a7'): Timestamp('2021-02-28'), ('a', 'a8'): Timestamp('2021-02-28'), ('a', 'a9'): Timestamp('2021-02-28'), ('a', 'a10'): Timestamp('2021-02-28'), ('a', 'a11'): Timestamp('2021-02-28'), ('a', 'a12'): Timestamp('2021-02-28'), ('a', 'a13'): Timestamp('2021-02-28'), ('a', 'a14'): Timestamp('2021-02-28'), ('a', 'a15'): Timestamp('2021-02-28'), ('a', 'a16'): Timestamp('2021-02-28'), ('a', 'a17'): Timestamp('2021-02-28'), ('a', 'a18'): Timestamp('2021-02-28'), ('a', 'a19'): Timestamp('2021-02-28'), ('a', 'a20'): Timestamp('2021-02-28'), ('a', 'a21'): Timestamp('2021-02-28'), ('a', 'a22'): Timestamp('2021-02-28'), ('a', 'a23'): Timestamp('2021-02-28'), ('a', 'a24'): Timestamp('2021-02-28'), ('a', 'a25'): Timestamp('2021-02-28'), ('a', 'a26'): Timestamp('2021-02-28'), ('a', 'a27'): Timestamp('2021-02-28'), ('a', 'a28'): Timestamp('2021-02-28'), ('a', 'a29'): Timestamp('2021-02-28'), ('a', 'a30'): Timestamp('2021-02-28'), ('b', 'b1'): Timestamp('2021-02-14'), ('b', 'b2'): Timestamp('2021-02-14'), ('b', 'b3'): Timestamp('2021-02-14'), ('b', 'b4'): Timestamp('2021-02-14'), ('b', 'b5'): Timestamp('2021-02-14'), ('b', 'b6'): Timestamp('2021-02-14'), ('b', 'b7'): Timestamp('2021-02-14'), ('b', 'b8'): Timestamp('2021-02-14'), ('b', 'b9'): Timestamp('2021-02-14'), ('b', 'b10'): Timestamp('2021-02-14'), ('c', 'c1'): Timestamp('2021-02-17'), ('c', 'c2'): Timestamp('2021-02-17'), ('c', 'c3'): Timestamp('2021-02-17'), ('c', 'c4'): Timestamp('2021-02-17'), ('c', 'c5'): Timestamp('2021-02-17'), ('c', 'c6'): Timestamp('2021-02-17'), ('c', 'c7'): Timestamp('2021-02-17'), ('c', 'c8'): Timestamp('2021-02-17'), ('c', 'c9'): Timestamp('2021-02-17'), ('c', 'c10'): Timestamp('2021-02-17'), ('c', 'c11'): Timestamp('2021-02-17'), ('c', 'c12'): Timestamp('2021-02-17'), ('c', 'c13'): Timestamp('2021-02-17'), ('c', 'c14'): Timestamp('2021-02-17'), ('c', 'c15'): Timestamp('2021-02-17'), ('c', 'c16'): Timestamp('2021-02-17'), ('c', 'c17'): Timestamp('2021-02-17'), ('c', 'c18'): Timestamp('2021-02-17'), ('c', 'c19'): Timestamp('2021-02-17'), ('c', 'c20'): Timestamp('2021-02-17'), ('c', 'c21'): Timestamp('2021-02-17'), ('c', 'c22'): Timestamp('2021-02-17'), ('c', 'c23'): Timestamp('2021-02-17'), ('c', 'c24'): Timestamp('2021-02-17'), ('c', 'c25'): Timestamp('2021-02-17'), ('c', 'c26'): Timestamp('2021-02-17'), ('c', 'c27'): Timestamp('2021-02-17'), ('c', 'c28'): Timestamp('2021-02-17'), ('c', 'c29'): Timestamp('2021-02-17'), ('c', 'c30'): Timestamp('2021-02-17'), ('c', 'c31'): Timestamp('2021-02-17'), ('c', 'c32'): Timestamp('2021-02-17'), ('c', 'c33'): Timestamp('2021-02-17'), ('c', 'c34'): Timestamp('2021-02-17'), ('c', 'c35'): Timestamp('2021-02-17'), ('c', 'c36'): Timestamp('2021-02-17'), ('c', 'c37'): Timestamp('2021-02-17'), ('c', 'c38'): Timestamp('2021-02-17'), ('c', 'c39'): Timestamp('2021-02-17'), ('c', 'c40'): Timestamp('2021-02-17'), ('c', 'c41'): Timestamp('2021-02-17'), ('c', 'c42'): Timestamp('2021-02-17'), ('c', 'c43'): Timestamp('2021-02-17'), ('c', 'c44'): Timestamp('2021-02-17'), ('c', 'c45'): Timestamp('2021-02-17'), ('c', 'c46'): Timestamp('2021-02-17'), ('c', 'c47'): Timestamp('2021-02-17'), ('c', 'c48'): Timestamp('2021-02-17'), ('c', 'c49'): Timestamp('2021-02-17'), ('c', 'c50'): Timestamp('2021-02-17'), ('c', 'c51'): Timestamp('2021-02-17'), ('c', 'c52'): Timestamp('2021-02-17'), ('c', 'c53'): Timestamp('2021-02-17'), ('c', 'c54'): Timestamp('2021-02-17'), ('c', 'c55'): Timestamp('2021-02-17'), ('c', 'c56'): Timestamp('2021-02-17'), ('c', 'c57'): Timestamp('2021-02-17'), ('c', 'c58'): Timestamp('2021-02-17'), ('c', 'c59'): Timestamp('2021-02-17'), ('c', 'c60'): Timestamp('2021-02-17'), ('c', 'c61'): Timestamp('2021-02-17'), ('c', 'c62'): Timestamp('2021-02-17'), ('c', 'c63'): Timestamp('2021-02-17'), ('c', 'c64'): Timestamp('2021-02-17'), ('c', 'c65'): Timestamp('2021-02-17'), ('c', 'c66'): Timestamp('2021-02-17'), ('c', 'c67'): Timestamp('2021-02-17'), ('c', 'c68'): Timestamp('2021-02-17'), ('c', 'c69'): Timestamp('2021-02-17'), ('c', 'c70'): Timestamp('2021-02-17'), ('d', 'd1'): Timestamp('2021-02-17'), ('d', 'd2'): Timestamp('2021-02-17'), ('d', 'd3'): Timestamp('2021-02-17'), ('d', 'd4'): Timestamp('2021-02-17'), ('d', 'd5'): Timestamp('2021-02-17'), ('d', 'd6'): Timestamp('2021-02-17'), ('d', 'd7'): Timestamp('2021-02-17'), ('d', 'd8'): Timestamp('2021-02-17'), ('d', 'd9'): Timestamp('2021-02-17'), ('d', 'd10'): Timestamp('2021-02-17'), ('f', 'f1'): Timestamp('2021-02-28'), ('i', 'i1'): Timestamp('2021-02-12'), ('g', 'g1'): Timestamp('2021-02-12')}, 'change': {('a', 'a1'): 0, ('a', 'a2'): 0, ('a', 'a3'): 0, ('a', 'a4'): 0, ('a', 'a5'): 0, ('a', 'a6'): 0, ('a', 'a7'): 1, ('a', 'a8'): 0, ('a', 'a9'): 0, ('a', 'a10'): 0, ('a', 'a11'): 0, ('a', 'a12'): 0, ('a', 'a13'): 0, ('a', 'a14'): 0, ('a', 'a15'): 0, ('a', 'a16'): 1, ('a', 'a17'): 0, ('a', 'a18'): 0, ('a', 'a19'): -1, ('a', 'a20'): 0, ('a', 'a21'): 0, ('a', 'a22'): -1, ('a', 'a23'): 0, ('a', 'a24'): 0, ('a', 'a25'): 0, ('a', 'a26'): -1, ('a', 'a27'): 0, ('a', 'a28'): 1, ('a', 'a29'): -1, ('a', 'a30'): 0, ('b', 'b1'): 0, ('b', 'b2'): 0, ('b', 'b3'): 0, ('b', 'b4'): 0, ('b', 'b5'): 0, ('b', 'b6'): 0, ('b', 'b7'): 0, ('b', 'b8'): 0, ('b', 'b9'): 0, ('b', 'b10'): -1, ('c', 'c1'): 0, ('c', 'c2'): 0, ('c', 'c3'): 0, ('c', 'c4'): 0, ('c', 'c5'): 0, ('c', 'c6'): 0, ('c', 'c7'): 0, ('c', 'c8'): 0, ('c', 'c9'): 0, ('c', 'c10'): 0, ('c', 'c11'): 0, ('c', 'c12'): 0, ('c', 'c13'): 0, ('c', 'c14'): 0, ('c', 'c15'): 0, ('c', 'c16'): 0, ('c', 'c17'): 0, ('c', 'c18'): 0, ('c', 'c19'): 0, ('c', 'c20'): 0, ('c', 'c21'): 0, ('c', 'c22'): 0, ('c', 'c23'): 0, ('c', 'c24'): 0, ('c', 'c25'): 0, ('c', 'c26'): 0, ('c', 'c27'): 0, ('c', 'c28'): 0, ('c', 'c29'): 0, ('c', 'c30'): 0, ('c', 'c31'): 0, ('c', 'c32'): 0, ('c', 'c33'): 0, ('c', 'c34'): 0, ('c', 'c35'): 0, ('c', 'c36'): 0, ('c', 'c37'): 0, ('c', 'c38'): 0, ('c', 'c39'): 0, ('c', 'c40'): 0, ('c', 'c41'): 0, ('c', 'c42'): 0, ('c', 'c43'): 0, ('c', 'c44'): 0, ('c', 'c45'): 0, ('c', 'c46'): 0, ('c', 'c47'): 0, ('c', 'c48'): 0, ('c', 'c49'): 0, ('c', 'c50'): 0, ('c', 'c51'): 0, ('c', 'c52'): 0, ('c', 'c53'): 0, ('c', 'c54'): 0, ('c', 'c55'): 0, ('c', 'c56'): 0, ('c', 'c57'): 0, ('c', 'c58'): 0, ('c', 'c59'): 0, ('c', 'c60'): 0, ('c', 'c61'): 0, ('c', 'c62'): 0, ('c', 'c63'): 0, ('c', 'c64'): 0, ('c', 'c65'): 0, ('c', 'c66'): 0, ('c', 'c67'): 0, ('c', 'c68'): 0, ('c', 'c69'): 0, ('c', 'c70'): 0, ('d', 'd1'): 0, ('d', 'd2'): 0, ('d', 'd3'): 0, ('d', 'd4'): 0, ('d', 'd5'): 0, ('d', 'd6'): 0, ('d', 'd7'): 0, ('d', 'd8'): 0, ('d', 'd9'): 0, ('d', 'd10'): 0, ('f', 'f1'): 0, ('i', 'i1'): 0, ('g', 'g1'): 0}, 'signal': {('a', 'a1'): 0.03970536568342733, ('a', 'a2'): -0.06374209195546844, ('a', 'a3'): 0.04887447925300424, ('a', 'a4'): 0.01575039305035544, ('a', 'a5'): -0.09999999999999999, ('a', 'a6'): 0.05279941816002801, ('a', 'a7'): 0.010016176472915, ('a', 'a8'): -0.02449470412243375, ('a', 'a9'): 0.04869502067932577, ('a', 'a10'): 0.1, ('a', 'a11'): -0.1, ('a', 'a12'): 0.03027685639088534, ('a', 'a13'): -0.008406707578779347, ('a', 'a14'): 0.02331562618579834, ('a', 'a15'): -0.01895572953805233, ('a', 'a16'): 0.01321303262280628, ('a', 'a17'): 0.0471840306086552, ('a', 'a18'): 0.01764134675438401, ('a', 'a19'): -0.02975575175382789, ('a', 'a20'): 0.02540156637807194, ('a', 'a21'): 0.01049801497612664, ('a', 'a22'): -0.00768798591193653, ('a', 'a23'): -0.04651808810661669, ('a', 'a24'): -0.07975382397869742, ('a', 'a25'): 0.02774313999684366, ('a', 'a26'): -0.07628252363662207, ('a', 'a27'): -0.0246802445961122, ('a', 'a28'): 0.002526683431417061, ('a', 'a29'): -0.011475219734499, ('a', 'a30'): 0.07811172026900139, ('b', 'b1'): 0.05806278953704978, ('b', 'b2'): -0.02730772877790196, ('b', 'b3'): -0.01511648025231574, ('b', 'b4'): -0.1408331932235801, ('b', 'b5'): 0.06066688412750731, ('b', 'b6'): -0.001756125774471374, ('b', 'b7'): 0.09485626882296667, ('b', 'b8'): -0.05420268550526035, ('b', 'b9'): 0.0347802085337365, ('b', 'b10'): -0.009149937487730684, ('c', 'c1'): 0.2892069022181903, ('c', 'c2'): 0.3967565277356959, ('c', 'c3'): 0.2953936103710862, ('c', 'c4'): 0.8367722498665394, ('c', 'c5'): 0.9476463410530558, ('c', 'c6'): 0.7569778483880074, ('c', 'c7'): 0.9418729618853355, ('c', 'c8'): 0.5823731910502483, ('c', 'c9'): 0.3801380958841796, ('c', 'c10'): 0.840417572988295, ('c', 'c11'): 0.2709598829828612, ('c', 'c12'): 0.5418651708249795, ('c', 'c13'): 0.5910838658612337, ('c', 'c14'): 0.1642799495954194, ('c', 'c15'): 0.9098912617035192, ('c', 'c16'): 0.9054002286961012, ('c', 'c17'): 0.8825237896195792, ('c', 'c18'): 0.5484869743549324, ('c', 'c19'): 0.416185567918982, ('c', 'c20'): 0.9749498131497896, ('c', 'c21'): 0.9904332845725732, ('c', 'c22'): 0.7088811098573771, ('c', 'c23'): 0.5292993064129955, ('c', 'c24'): 0.2860870616937747, ('c', 'c25'): 0.338840895561976, ('c', 'c26'): 0.7459350017031432, ('c', 'c27'): -0.1511324692149147, ('c', 'c28'): 0.4693308821188915, ('c', 'c29'): 0.6659249271244383, ('c', 'c30'): 0.4807458505455962, ('c', 'c31'): -0.221358206035012, ('c', 'c32'): 0.6180445315988707, ('c', 'c33'): 0.1885337838551052, ('c', 'c34'): 0.7429931202376518, ('c', 'c35'): 0.8002078957742668, ('c', 'c36'): 0.3493342914079488, ('c', 'c37'): 0.3309638545936928, ('c', 'c38'): 0.2642847334995485, ('c', 'c39'): 0.05840480039086733, ('c', 'c40'): 0.6066163228881758, ('c', 'c41'): 0.6695966826754965, ('c', 'c42'): 0.4552511138794135, ('c', 'c43'): 0.7953107225709156, ('c', 'c44'): 0.684872567978258, ('c', 'c45'): -0.7056780042421102, ('c', 'c46'): -0.2132570428066563, ('c', 'c47'): -0.2982245804786927, ('c', 'c48'): -0.2930812855008404, ('c', 'c49'): -0.007322632308963899, ('c', 'c50'): -0.3378424058391866, ('c', 'c51'): -0.2597598493973547, ('c', 'c52'): 0.5701700950598918, ('c', 'c53'): -0.5385984977127748, ('c', 'c54'): -0.5862066330785679, ('c', 'c55'): 0.4089415907519486, ('c', 'c56'): -0.2178839109516601, ('c', 'c57'): 0.2018031189699861, ('c', 'c58'): 0.6566150449109974, ('c', 'c59'): 0.7277468383550132, ('c', 'c60'): 0.8123662929227967, ('c', 'c61'): -0.2122649654736288, ('c', 'c62'): 0.7163285570784564, ('c', 'c63'): 0.8784305049299558, ('c', 'c64'): 0.7505753404952612, ('c', 'c65'): 0.5033400950735804, ('c', 'c66'): -0.1082662590377242, ('c', 'c67'): 0.521848866588884, ('c', 'c68'): 0.1339926134218392, ('c', 'c69'): 0.2327903872639628, ('c', 'c70'): 0.5341954937532732, ('d', 'd1'): 0.5596960121932234, ('d', 'd2'): 0.8039815600022109, ('d', 'd3'): 0.4526569517559198, ('d', 'd4'): 0.4530754439510857, ('d', 'd5'): -0.2669800836305256, ('d', 'd6'): 0.1309535995900915, ('d', 'd7'): 0.540158353558727, ('d', 'd8'): 0.300326164813025, ('d', 'd9'): 0.7107819801662658, ('d', 'd10'): 0.2067913037755799, ('f', 'f1'): 0.05812634537324268, ('i', 'i1'): 0.8326445687250644, ('g', 'g1'): 1.0} } )
print(df.index.duplicated())

# why does this work and ...
for key, value in my_dict.items():
    df.loc[key, 'desc1'] = value

print(df)

# this doesn't?
print(df.index.map(my_dict))

traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/marco/pandas-marco/pandas/core/indexes/base.py", line 5366, in map
    new_values = super()._map_values(mapper, na_action=na_action)
  File "/home/marco/pandas-marco/pandas/core/base.py", line 930, in _map_values
    indexer = mapper.index.get_indexer(values)
  File "/home/marco/pandas-marco/pandas/core/indexes/base.py", line 3369, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
MarcoGorelli commented 3 years ago

Thanks @bravegag for the report - to expedite resoultion, could you please make your example more minimal? Currently there are lots of columns which seem irrelevant, and the example is pretty long

bravegag commented 3 years ago

@MarcoGorelli I have tried that to simplify the problem but then, I can no longer reproduce the issue. I have not been able to scale it down to a super simple example. It was already a lot of work to obfuscate the real data behind a,b,c.

In order to provide more context into this problem please check these two QAs. The first one is my original use-case, the second one is a description on this same problem after trying to implement the first OP.

  1. https://stackoverflow.com/questions/66398540/how-to-set-new-columns-in-a-multi-column-index-from-a-dict-with-partially-specif
  2. https://stackoverflow.com/questions/66399299/valueerror-reindexing-only-valid-with-uniquely-valued-index-objects
MarcoGorelli commented 3 years ago

This reproduces the error on master:

index = pd.MultiIndex.from_tuples(
    [("a", "a1"), ("a", "a2"), ("a", "a3"), ("a", "a4"), ("a", "a5")]
)
my_dict = {
    ("a",): "x1",
    ("b",): "x1",
    ("c", "c1"): "x1",
    ("c", "c2"): "x1",
    ("c", "c3"): "x1",
}
pd.Index(index).map(my_dict)
---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-1-077a7247623d> in <module>
      9     ("c", "c3"): "x1",
     10 }
---> 11 pd.Index(index).map(my_dict)

~/pandas-marco/pandas/core/indexes/base.py in map(self, mapper, na_action)
   5376         from pandas.core.indexes.multi import MultiIndex
   5377 
-> 5378         new_values = super()._map_values(mapper, na_action=na_action)
   5379 
   5380         attributes = self._get_attributes_dict()

~/pandas-marco/pandas/core/base.py in _map_values(self, mapper, na_action)
    928             values = self._values
    929 
--> 930             indexer = mapper.index.get_indexer(values)
    931             new_values = algorithms.take_nd(mapper._values, indexer)
    932 

~/pandas-marco/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   3379 
   3380         if not self._index_as_unique:
-> 3381             raise InvalidIndexError(self._requires_unique_msg)
   3382 
   3383         if not self._should_compare(target) and not is_interval_dtype(self.dtype):

InvalidIndexError: Reindexing only valid with uniquely valued Index objects
ponkumarpandian commented 3 years ago

@MarcoGorelli Any workaround for this issue under this bug get resolved, I am getting similar issues and i got blocked. It will be great if there is any work around on this issue

mecopur commented 1 year ago

take