Open ivan-marroquin opened 1 month ago
breaking change : #55948
@ivan-marroquin
It does not look like this is supported. Two things -
id_vars
is no longer checked against a flattened index. (changes in https://github.com/pandas-dev/pandas/pull/55948). For multiIndex, this must be a list of tuples.var_name
name must be a scalar as mentioned in the docs. (enforeced in https://github.com/pandas-dev/pandas/pull/55948) It looks like for a multiIndex, the var_names are determined from the index names and cannot be specified. Therefore, following might be the correct way to go about this -
data.columns.names = ['Attribute', 'Ticker']
df.melt(id_vars= [('Date', '')]).rename(columns={('Date', ''): 'Date'})
Output -
Date Attribute Ticker value
0 2023-10-12 00:00:00+00:00 Adj Close HDFCBANK.NS 1.528971e+03
1 2023-10-13 00:00:00+00:00 Adj Close HDFCBANK.NS 1.515061e+03
2 2023-10-16 00:00:00+00:00 Adj Close HDFCBANK.NS 1.508994e+03
3 2023-10-17 00:00:00+00:00 Adj Close HDFCBANK.NS 1.520438e+03
4 2023-10-18 00:00:00+00:00 Adj Close HDFCBANK.NS 1.499277e+03
... ... ... ... ...
5851 2024-10-04 00:00:00+00:00 Volume TCS.NS 2.965463e+06
5852 2024-10-07 00:00:00+00:00 Volume TCS.NS 1.472619e+06
5853 2024-10-08 00:00:00+00:00 Volume TCS.NS 1.541867e+06
5854 2024-10-09 00:00:00+00:00 Volume TCS.NS 1.082504e+06
5855 2024-10-10 00:00:00+00:00 Volume TCS.NS 2.378875e+06
[5856 rows x 4 columns]
@ivan-marroquin - can you post a reproducible example that does not depend on yfinance
.
@rhshadrach I could reproduce it like this. Worked on pandas 2.1.4, but not on 2.2.0
import pandas as pd
df = pd.DataFrame({"x": ["a","a","b","b"], "y": ["c","d","c","d"], "z": [1,2,3,4]}).set_index("x")
df.columns = pd.MultiIndex.from_tuples([("y", "i"), ("y", "j")])
df.reset_index().melt(id_vars="x")
Output in 2.1.4:
| x | variable_0 | variable_1 | value
-- | -- | -- | -- | --
a | y | i | c
a | y | i | d
b | y | i | c
b | y | i | d
a | y | j | 1
a | y | j | 2
b | y | j | 3
b | y | j | 4
Output in 2.2.0:
KeyError Traceback (most recent call last)
Cell In[5], line 1
----> 1 df.reset_index().melt(id_vars="x")
File <project path>\env\lib\site-packages\pandas\core\frame.py:9915, in DataFrame.melt(self, id_vars, value_vars, var_name, value_name, col_level, ignore_index)
9905 @Appender(_shared_docs["melt"] % {"caller": "df.melt(", "other": "melt"})
9906 def melt(
9907 self,
(...)
9913 ignore_index: bool = True,
9914 ) -> DataFrame:
-> 9915 return melt(
9916 self,
9917 id_vars=id_vars,
9918 value_vars=value_vars,
9919 var_name=var_name,
9920 value_name=value_name,
9921 col_level=col_level,
9922 ignore_index=ignore_index,
9923 ).__finalize__(self, method="melt")
File <project path>\env\lib\site-packages\pandas\core\reshape\melt.py:74, in melt(frame, id_vars, value_vars, var_name, value_name, col_level, ignore_index)
70 if missing.any():
71 missing_labels = [
72 lab for lab, not_found in zip(labels, missing) if not_found
73 ]
---> 74 raise KeyError(
75 "The following id_vars or value_vars are not present in "
76 f"the DataFrame: {missing_labels}"
77 )
78 if value_vars_was_not_none:
79 frame = frame.iloc[:, algos.unique(idx)]
KeyError: "The following id_vars or value_vars are not present in the DataFrame: ['x']"
In various places, we do allow "x"
to match ("x", "")
for a MultiIndex. However, this is not the case for MultiIndex.get_indexer
.
df = pd.DataFrame({"x": ["a","a","b","b"], "y": ["c","d","c","d"], "z": [1,2,3,4]}).set_index("x")
df.columns = pd.MultiIndex.from_tuples([("y", "i"), ("y", "j")])
df = df.reset_index()
print(df.columns.get_indexer(pd.Index(["x"])))
# [-1]
If we returned [0]
here instead, this regression would also be fixed.
cc @mroeschke
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
KeyError Traceback (most recent call last) Cell In [19], line 15 11 data= data.reset_index() 13 # melt the DataFrame to make it long format where each row is a 14 # unique combination of Date, Ticker, and attributes ---> 15 data_melted= data.melt(id_vars= ['Date'], var_name= ['Attribute', 'Ticker']) 17 # pivot the melted DataFrame to have the attributes (Open, High, Low, etc.) as columns 18 data_pivoted= data_melted.pivot_table(index= ['Date', 'Ticker'], 19 columns= 'Attribute', values= 'value', 20 aggfunc= 'first')
File ~/python_3.9.0/lib/python3.9/site-packages/pandas/core/frame.py:9942, in DataFrame.melt(self, id_vars, value_vars, var_name, value_name, col_level, ignore_index) 9932 @Appender(_shared_docs["melt"] % {"caller": "df.melt(", "other": "melt"}) 9933 def melt( 9934 self, (...) 9940 ignore_index: bool = True, 9941 ) -> DataFrame: -> 9942 return melt( 9943 self, 9944 id_vars=id_vars, 9945 value_vars=value_vars, 9946 var_name=var_name, 9947 value_name=value_name, ... 77 ) 78 if value_vars_was_not_none: 79 frame = frame.iloc[:, algos.unique(idx)]
KeyError: "The following id_vars or value_vars are not present in the DataFrame: ['Date']"
Expected Behavior
The melt function should generate a DataFrame to make it long format where each row is a unique combination of Date, Ticker, and attributes
Installed Versions