Allow categorical column for setting color of lines in parallel coordinates via plotly express

joelostblom commented 4 years ago

In plotly express it is generally enough to indicate which categorical column to map to colors in the plot and plotly takes care of the mapping. However, for parallel coordinates, categorical columns throw an error indicated that the mapping must be done manually. It would be great if parallel coordinates worked the same as the rest of plotly express and mapped categoricals to colors under the hood.

import plotly.express as px

df = px.data.iris()
px.parallel_coordinates(df, color="species")

ValueError: 
    Invalid element(s) received for the 'color' property of parcoords.line
        Invalid elements include: ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa']

    The 'color' property is a color and may be specified as:
      - A hex string (e.g. '#ff0000')
      - An rgb/rgba string (e.g. 'rgb(255,0,0)')
      - An hsl/hsla string (e.g. 'hsl(0,100%,50%)')
      - An hsv/hsva string (e.g. 'hsv(0,100%,100%)')
      - A named CSS color:
            aliceblue, antiquewhite, aqua, aquamarine, azure,
            beige, bisque, black, blanchedalmond, blue,
            blueviolet, brown, burlywood, cadetblue,
            chartreuse, chocolate, coral, cornflowerblue,
            cornsilk, crimson, cyan, darkblue, darkcyan,
            darkgoldenrod, darkgray, darkgrey, darkgreen,
            darkkhaki, darkmagenta, darkolivegreen, darkorange,
            darkorchid, darkred, darksalmon, darkseagreen,
            darkslateblue, darkslategray, darkslategrey,
            darkturquoise, darkviolet, deeppink, deepskyblue,
            dimgray, dimgrey, dodgerblue, firebrick,
            floralwhite, forestgreen, fuchsia, gainsboro,
            ghostwhite, gold, goldenrod, gray, grey, green,
            greenyellow, honeydew, hotpink, indianred, indigo,
            ivory, khaki, lavender, lavenderblush, lawngreen,
            lemonchiffon, lightblue, lightcoral, lightcyan,
            lightgoldenrodyellow, lightgray, lightgrey,
            lightgreen, lightpink, lightsalmon, lightseagreen,
            lightskyblue, lightslategray, lightslategrey,
            lightsteelblue, lightyellow, lime, limegreen,
            linen, magenta, maroon, mediumaquamarine,
            mediumblue, mediumorchid, mediumpurple,
            mediumseagreen, mediumslateblue, mediumspringgreen,
            mediumturquoise, mediumvioletred, midnightblue,
            mintcream, mistyrose, moccasin, navajowhite, navy,
            oldlace, olive, olivedrab, orange, orangered,
            orchid, palegoldenrod, palegreen, paleturquoise,
            palevioletred, papayawhip, peachpuff, peru, pink,
            plum, powderblue, purple, red, rosybrown,
            royalblue, rebeccapurple, saddlebrown, salmon,
            sandybrown, seagreen, seashell, sienna, silver,
            skyblue, slateblue, slategray, slategrey, snow,
            springgreen, steelblue, tan, teal, thistle, tomato,
            turquoise, violet, wheat, white, whitesmoke,
            yellow, yellowgreen
      - A number that will be interpreted as a color
        according to parcoords.line.colorscale
      - A list or array of any of the above

Versions:

-----
pandas      1.0.3
plotly      4.7.1
-----
IPython             6.5.0
jupyter_client      5.2.3
jupyter_core        4.6.3
jupyterlab          2.1.0
notebook            5.6.0
-----
Python 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0]
Linux-5.6.13-arch1-1-x86_64-with-arch
4 logical CPU cores
-----
Session information updated at 2020-05-22 08:16

joelostblom commented 4 years ago

Related to this is that passing a column with color names as suggested in the error message does not seem to work:

df = px.data.iris()
df['species'] = df['species'].map({'setosa': 'coral', 'virginica': 'steelblue', 'versicolor': 'gold'})
px.parallel_coordinates(df, color="species")

Passing a list of numbers works fine:

df = px.data.iris()
df['species'] = df['species'].map({'setosa': 1, 'virginica': 2, 'versicolor': 3})
px.parallel_coordinates(df, color="species")

nicolaskruchten commented 4 years ago

Yep, I agree! See https://github.com/plotly/plotly.py/issues/2143 :)

We'd need to bake in something like https://plotly.com/python/colorscales/#customizing-tick-text-on-discrete-color-bars to PX. I'm all for it, I just haven't had the time... If you'd be into it I'd be happy to provide guidance and review!

nicolaskruchten commented 4 years ago

Re what works and doesn't above: you can specify a single color I believe but not an array.

joelostblom commented 4 years ago

Thanks for the quick reply!

Yep, I agree! See #2143 :)

Sorry didn't see it, I was just searching for "parallel" before posting.

We'd need to bake in something like https://plotly.com/python/colorscales/#customizing-tick-text-on-discrete-color-bars to PX. I'm all for it, I just haven't had the time... If you'd be into it I'd be happy to provide guidance and review!

Would normally be happy to, but don't have time either at the moment unfortunately (thesis writing!)

Re what works and doesn't above: you can specify a single color I believe but not an array.

Ah ok, the last line of the error message says - A list or array of any of the above, so that could use need updating also when someone gets to this.

Mihiretukebede commented 1 year ago

Does that mean parallel coordinates only accepts numeric variable for colorizing the lines?

jklen commented 1 year ago

any progess on this pls?

Mihiretukebede commented 1 year ago

@jklen The parallel coordinates didn't work well. I was close enough with the parallel_categories plot but still didn't work well because it is adding the numeric legend bar and there was no way I can remove that. But the go.Parcat works well for me. You can check this reference here: https://plotly.com/python/parallel-categories-diagram/

For example:

import pandas as pd
import numpy as np
import plotly.express as px

# Generate fake data
n = 1000
menopause_status = np.random.choice(['premenopausal', 'perimenopausal', 'postmenopausal'], n)
i = np.random.choice(['low', 'medium', 'high'], n)
adjustment = np.random.choice(['age', 'BMI', 'smoking'], n)
author_year = np.random.choice(['2010', '2015', '2020'], n)
direction = np.random.choice(['Null', 'Positive', 'Negative'], n)

ec = pd.DataFrame({'menopause_status': menopause_status,
                   'i': i,
                   'adjustment': adjustment,
                   'author_year': author_year,
                   'direction': direction})

direction_map = {'Null': 0, 'Positive': 1, 'Negative': 2}
ec['direction_num'] = ec['direction'].map(direction_map)

colorscale = ['lightgray', '#00868B', 'red']

fig = px.parallel_categories(ec, 
                             dimensions=['i','menopause_status', 'adjustment','direction','author_year'], 
                             color="direction_num",
                             color_continuous_scale=colorscale,
                             labels={"menopause_status": "Menopause status",
                                     "i": "Biomarker",
                                     "adjustment": "Adjustment",
                                     "author_year": "Author year",
                                     "direction": "Direction of association"})
fig.update_layout(coloraxis_colorbar=dict(tickvals=[0,1,2],
                                          ticktext=['Null', 'Positive', 'Negative'],
                                          tickmode='array'))

fig.update_layout(legend={})

fig.show()

newplot (12)

Instead the go.Parcats() worked well. You can have a look at the following based on the same fake data.

import plotly.graph_objects as go
import pandas as pd

# Create dimensions for each column
menopause_dim = go.parcats.Dimension(values=ec.menopause_status, label='Menopause Status')
i_dim = go.parcats.Dimension(values=ec.i, label='Biomarker')
adjustment_dim = go.parcats.Dimension(values=ec.adjustment, label='Adjustment')
author_year_dim = go.parcats.Dimension(values=ec.author_year, label='Author Year')
direction_dim = go.parcats.Dimension(values=ec.direction_num, label='Direction of association', categoryarray=[0, 1, 2], ticktext=['Null', 'Positive', 'Negative'])

# Create parcats trace
color = ec.direction_num
colorscale = [    [0, 'lightgray'],
    [0.5, '#00868B'],
    [1, 'red']
]
fig = go.Figure(data=[go.Parcats(dimensions=[ i_dim, menopause_dim, adjustment_dim, direction_dim, author_year_dim],
                                 line={'color': color, 'colorscale': colorscale},
                                 hoveron='color',
                                 #labelfont={'size': 18, 'family': 'Times'},
                                 #tickfont={'size': 16, 'family': 'Times'},
                                 arrangement='freeform')])
fig.show()

newplot (10)

Good luck! Sorry if the code is dirty. I hope you can reproduce it.

gvwilson commented 2 months ago

Hi - we are trying to tidy up the stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. Alternatively, if it's a request for tech support, please post in our community forum. Thank you - @gvwilson

plotly / plotly.py

Allow categorical column for setting color of lines in parallel coordinates via plotly express #2494