pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.62k stars 17.91k forks source link

Plotting methods (line, area, ..) don't handle string x values #22334

Closed jorisvandenbossche closed 5 years ago

jorisvandenbossche commented 6 years ago

From https://github.com/pandas-dev/pandas/pull/22307#pullrequestreview-145585750: when specifying the x labels in one of our plotting calls where this consists of strings, nothing is shown on the x labels:

In [6]: df = pd.DataFrame({'sales': [3, 2, 3],
   ...:                    'visits': [20, 42, 28],
   ...:                    'day': ['Monday', 'Tuesday', 'Wednesday']})

In [7]: ax = df.plot.area(x='day')

figure_1-1

The same is true for line plot, for bar plot it does work.

Related to https://github.com/pandas-dev/pandas/issues/18687 (but not exactly the same), and potentially https://github.com/pandas-dev/pandas/pull/18726. I would have expected that there is already an issue for this, but didn't directly find it.

jtweeder commented 6 years ago

I think I have found the place this is happening, or not happening. Bar and barh both work with the labels as above, but lineplot and areaplot (subclass of lineplot), both do not. For bar and barh the xtick labeling happens in _post_plot_logic which has the following (this is not in lineplot _post_plot_logic).

        ...
        if self.use_index:
            str_index = [pprint_thing(key) for key in data.index]
        else:
            str_index = [pprint_thing(key) for key in range(data.shape[0])]
        name = self._get_index_name()

        self._decorate_ticks(ax, name, str_index)

    def _decorate_ticks(self, ax, name, ticklabels):
        ax.set_xticks(self.tick_pos)
        ax.set_xticklabels(ticklabels)
        if name is not None and self.use_index:
            ax.set_xlabel(name)

I have it changed and working locally in the lineplot class when using the df.plot.area and df.plot.line with this toy example as above. Wanted to pass that information to smarter minds for review before proceeding with a PR.

ImportanceOfBeingErnest commented 5 years ago

This seems to be partially fixed in 0.24.1 where the code from above gives

image

However, this just brings to light the real problem, namely that the ticks are essentially only at the correct position by coincidence. E.g. if you let the axes autoscale,

df = pd.DataFrame({'sales': [3, 2, 3],
                   'visits': [20, 42, 28],
                   'day': ['Monday', 'Tuesday', 'Wednesday']})

ax = df.plot.area(x='day')
ax.set_xlim(None, None)
ax.autoscale()
plt.show()

the ticks are completely off.

image

In total this is the same issue as plotting with index of objects should use FixedLocator #7612 which is already 5 years old and still unsolved.

jorisvandenbossche commented 5 years ago

this is the same issue as plotting with index of objects should use FixedLocator #7612

Ah, thanks for the link, that is probably the issue I was looking for.

which is already 5 years old and still unsolved.

Contributions are very welcome!

user1493 commented 5 years ago

This issue still persists! In my case, I'm seeing some xticks but not all of them for line and area plots. In case of bar and barh, things work correctly. Untitled

anordin95 commented 5 years ago

I'm going to take a crack at this. If anyone else is trying as well let me know!

TechTarun commented 5 years ago

Hey everyone I want to contribute my solution to this issue. Can I??

ajsharma22 commented 5 years ago

Hey everyone I want to contribute my solution to this issue. Can I??

Sure, post how are you going to solve this issue.