Open MalteMax opened 4 years ago
import numpy as np
import pandas as pd
# Create the original DataFrame with NaN
df1 = pd.DataFrame({'id': ['a', np.NaN, 'c'],
'A_1999': [1, 2, 3],
'A_2000': [5, 6, 7]})
# Store the original NaN ids for reference
nan_ids = df1[df1['id'].isna()]
# Apply wide_to_long transformation
df2 = pd.wide_to_long(df1,
stubnames=['A'],
sep='_',
i='id',
j='year').reset_index()
# Restore NaNs in the id column of df2 where they were originally in df1
for index, row in nan_ids.iterrows():
# Find the corresponding rows in df2 to set id back to NaN
df2.loc[(df2['year'] == 1999) & (df2['A'] == row['A_1999']), 'id'] = np.NaN
df2.loc[(df2['year'] == 2000) & (df2['A'] == row['A_2000']), 'id'] = np.NaN
# Display the transformed DataFrame
print(df2)
Code and output:
Problem description
df1 has one NaN in the id column. When applying
wide_to_long
to df1,wide_to_long
seems to fill the NaNs from df1 with the preceding value from the df1 dataframe (namely, a).Why might this be a problem: instances of
id == 'a'
in df1 are not the same as in df2 (because the NaNs from df1 have been replaced by a in df2):versus:
Expected Output
I would expect
wide_to_long
to eitheri
argument contains NaNs, orOutput of
pd.show_versions()