plotly / plotly_express

Plotly Express - Simple syntax for complex charts. Now integrated into plotly.py!
https://plot.ly/python/plotly-express/
MIT License
4 stars 0 forks source link

Cannot Animate by Date #138

Closed johnmccain closed 5 years ago

johnmccain commented 5 years ago

Attempting to animate a plot using a pandas Timestamp column results in an exception.

Using Python 3.6.8, plotly_express==0.4.1, plotly==4.1.0

Code to replicate:

import plotly_express as px
import pandas as pd

# create a dataframe with mock data
df = pd.DataFrame([
    {
        'x': 1,
        'y': 1,
        'date': '2018-01-01'
    },
    {
        'x': 2,
        'y': 1,
        'date': '2018-01-02'
    },
    {
        'x': 3,
        'y': 1,
        'date': '2018-01-03'
    }
])
df['date'] = pd.to_datetime(df['date'])
df.head()

#       date | x | y
# 2018-01-01 | 1 | 1
# 2018-01-02 | 2 | 1
# 2018-01-03 | 3 | 1

# attempt to plot
px.scatter(df,
          x='x',
          y='y',
          animation_frame='date')

Exception & stack trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-7f7f015611c1> in <module>()
      2           x='x',
      3           y='y',
----> 4           animation_frame='date')

~/.local/lib/python3.6/site-packages/plotly/express/_chart_types.py in scatter(data_frame, x, y, color, symbol, size, hover_name, hover_data, text, facet_row, facet_col, error_x, error_x_minus, error_y, error_y_minus, animation_frame, animation_group, category_orders, labels, color_discrete_sequence, color_discrete_map, color_continuous_scale, range_color, color_continuous_midpoint, symbol_sequence, symbol_map, opacity, size_max, marginal_x, marginal_y, trendline, trendline_color_override, log_x, log_y, range_x, range_y, render_mode, title, template, width, height)
     50     In a scatter plot, each row of `data_frame` is represented by a symbol mark in 2D space.
     51     """
---> 52     return make_figure(args=locals(), constructor=go.Scatter)
     53 
     54 

~/.local/lib/python3.6/site-packages/plotly/express/_core.py in make_figure(args, constructor, trace_patch, layout_patch)
    873     grouped = args["data_frame"].groupby(grouper, sort=False)
    874 
--> 875     orders, sorted_group_names = get_orderings(args, grouper, grouped)
    876 
    877     has_marginal_x = bool(args.get("marginal_x", False))

~/.local/lib/python3.6/site-packages/plotly/express/_core.py in get_orderings(args, grouper, grouped)
    858             group_names = sorted(
    859                 group_names,
--> 860                 key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1,
    861             )
    862 

~/.local/lib/python3.6/site-packages/plotly/express/_core.py in <lambda>(g)
    858             group_names = sorted(
    859                 group_names,
--> 860                 key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1,
    861             )
    862 

ValueError: Timestamp('2018-01-01 00:00:00') is not in list
johnmccain commented 5 years ago

Would it be better to submit this issue under the plotly.py repository instead?

johnmccain commented 5 years ago

I have found the cause of the issue, in _core.py:847

Pandas .unique() method returns a NumPy array, and as given in the examples section, it will convert a pandas.Timestamp to a numpy.datetime64.

>>> pd.Series([pd.Timestamp('2016-01-01') for _ in range(3)]).unique()
array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

The uniques list defined at _core.py:847 uses the .unique() method, converting the series of pandas.Timestamp into an array of numpy.datetime64

uniques = args["data_frame"][col].unique()

This distinction is relevant on line 857:

group_names = sorted(
    group_names,
    key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1
)

Which causes the ValueError when .index() does not find a matching element due to this behavior of pandas.Timestamp and numpy.datetime64 seen here:

>>> pandas.Timestamp('2018-01-01 00:00:00') in [numpy.datetime64('2018-01-01T00:00:00.000000000')]
True
>>> [numpy.datetime64('2018-01-01T00:00:00.000000000')].index(pandas.Timestamp('2018-01-01 00:00:00'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Timestamp('2018-01-01 00:00:00') is not in list
johnmccain commented 5 years ago

Seeing as the faulty code is in the plotly.py repository, I am closing this issue and have reopened another there #1737