Open jekwatt opened 2 years ago
Answers from James:
from pandas import DataFrame, date_range, MultiIndex, merge
from numpy.random import default_rng
from string import ascii_lowercase
rng = default_rng(0)
idx = date_range('2000-01-01', periods=100)
lft = DataFrame({
'number': rng.integers(3, size=len(idx)),
'letter': rng.choice([*ascii_lowercase[:3]], size=len(idx)),
})
rgt = DataFrame({
'number': rng.integers(3, size=len(idx)),
'letter': rng.choice([*ascii_lowercase[:3]], size=len(idx)),
})
cols = lft.columns.intersection(rgt.columns)
diff_idx = (lft[cols] != rgt[cols]).all(axis='columns')[lambda s: s].index
same_idx = (lft[cols] == rgt[cols]).all(axis='columns')[lambda s: s].index
same = merge(
lft[cols].loc[same_idx].set_axis(MultiIndex.from_product([['lft'], lft.columns]), axis='columns'),
rgt[cols].loc[same_idx].set_axis(MultiIndex.from_product([['rgt'], rgt.columns]), axis='columns'),
left_index=True, right_index=True,
)
diff = merge(
lft[cols].loc[diff_idx].set_axis(MultiIndex.from_product([['lft'], lft.columns]), axis='columns'),
rgt[cols].loc[diff_idx].set_axis(MultiIndex.from_product([['rgt'], rgt.columns]), axis='columns'),
left_index=True, right_index=True,
)
print(
same,
diff,
sep='\n',
)
lft
and rgt
are DataFrame
s with some commonly named columns, where the row values, aligned on a common index (a time index in this case,) show row-level differences.
I want to see what's the same, what's different.
To find rows with discrepancies: