Open boeddeker opened 6 years ago
Strange, I'm not sure what's going on. You're welcome to take a look in pandas/plotting/_core.py
if you're interested :)
Thanks for the hint to the file. I already took a look with pycharm, but I didn't locate the bug.
I am working on this. The current implementation ignores the order of string Index. https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_core.py#L578
eg. This makes the same result.
%matplotlib inline
import pandas as pd
df1 = pd.DataFrame([{'x': 'a', 'y': 1}, {'x': 'b', 'y': 2}])
df2 = pd.DataFrame([{'x': 'b', 'y': 3}, {'x': 'a', 'y': 4}])
ax = None
ax = df1.plot('x', 'y', ax=ax)
ax = df2.plot('x', 'y', ax=ax)
There is only one solution; converting a string Index into the numeric one if available. I don't know whether pandas should support such conversion in its internals.
In the case where I hit the problem, I had strings that are not convertible to floats. Your example highlights the error better.
Converting strings to floats would reduce the occurrence of this bug. Maybe handling strings need another solution.
An idea: The strings (labels) can be stored in xticklabels
.
If the labels are string inside the _get_xticks
the xticklabels
are read, append with missing labels and the xticks
are calculated from them.
This would require the ax object in _get_xticks
.
I have now example code that demonstrates my idea.
%matplotlib inline
import pandas as pd
df1 = pd.DataFrame([{'x': 'a', 'y': 1}, {'x': 'b', 'y': 2}])
df2 = pd.DataFrame([{'x': 'b', 'y': 3}, {'x': 'a', 'y': 4}])
def df_xstr_plot(df, x=None, y=None, ax=None):
df = df.copy()
if ax is not None:
tick_labels = list(map(
(lambda tick_label: tick_label.get_text()),
ax.get_xticklabels()
))
else:
tick_labels = []
for new_tick_label in df[x]:
if new_tick_label not in tick_labels:
tick_labels.append(new_tick_label)
# map str to int
mapping = {tick_label: i for i, tick_label in enumerate(tick_labels)}
df['x'] = df['x'].apply(lambda x: mapping[x])
ax = df.plot(x, y, ax=ax)
# Assign the correct xticklabels
ax.set_xticks(list(range(len(tick_labels))))
ax.set_xticklabels(tick_labels)
return ax
ax = None
ax = df_xstr_plot(df1, 'x', 'y', ax=ax)
ax = df_xstr_plot(df2, 'x', 'y', ax=ax) # correct
ax = None
ax = df1.plot('x', 'y', ax=ax)
ax = df2.plot('x', 'y', ax=ax) # wrong
This seems a bit complex. I don't think pandas should be doing anything special here, we should rely on matplotlib to handle all the string <-> position logic.
You are right, I forgot to test if matplotlib can handle strings.
So the solution would be to add a further branch to _get_xticks
for strings, that does not convert the strings to int.
%matplotlib inline
import pandas as pd
import matplotlib.pylab as plt
df1 = pd.DataFrame([{'x': 'a', 'y': 1}, {'x': 'b', 'y': 2}])
df2 = pd.DataFrame([{'x': 'b', 'y': 3}, {'x': 'a', 'y': 4}])
def df_xstr_plot(df, x=None, y=None, ax=None):
if ax is None:
figure, ax = plt.subplots(1, 1)
ax.plot(df[x], df[y])
return ax
ax = None
ax = df_xstr_plot(df1, 'x', 'y', ax=ax)
ax = df_xstr_plot(df2, 'x', 'y', ax=ax) # correct
ax = None
ax = df1.plot('x', 'y', ax=ax)
ax = df2.plot('x', 'y', ax=ax) # wrong
Code Sample, a copy-pastable example if possible
Problem description
I want to plot multiple dataframes in one graph. The x values are strings. The x value order in both dataframes is different.
The first plot draws the line
x = ['1', '2']
andy = [1, 2]
The second plot draws the linex = ['2', '1']
andy = [4, 3]
Since the second plot overwrites the xticks, the first line is nowx = ['2', '1']
andy = [1, 2]
.Expected Output
Output of
pd.show_versions()