vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.36k stars 793 forks source link

MNIST image (should be in docs) #2073

Open drozzy opened 4 years ago

drozzy commented 4 years ago

Hi, guys.

How can I plot an mnist image in jupyter? It's a (1, 28, 28) tensor or a dataframe with 784 columns.

Love the library, but I think it would really be beneficial to include an example of how to plot an MNIST (1, 28, 28) image. There are a lot of matplotlib/jupyter examples out there, and I think altair could make it much more pleasant.

jakevdp commented 4 years ago

Does something like this answer help you? https://stackoverflow.com/questions/60019006/can-we-plot-image-data-in-altair

ChiaLingWeng commented 1 year ago

Use flatten transforms and window transforms as the link mentioned:

import altair as alt
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_openml
pixel_values,targets = fetch_openml("mnist_784",version=1,return_X_y=True)

reshape to the proper dimension:

# get data slice --> here we get 2 image data
image_data = pixel_values[1:3]
pic = []
for i in range(len(image_data)):
    pic.append(image_data.iloc[i].values.reshape(28,28))

data = pd.DataFrame({
    'image': pic  # list of 2D arrays
})
alt.Chart(data).transform_window(
    index='count()'           # number each of the images
).transform_flatten(
    ['image']                 # extract rows from each image
).transform_window(
    row='count()',            # number the rows...
    groupby=['index']         # ...within each image
).transform_flatten(
    ['image']                 # extract the values from each row
).transform_window(
    column='count()',         # number the columns...
    groupby=['index', 'row']  # ...within each row & image
).mark_rect().encode(
    alt.X('column:O', axis=None),
    alt.Y('row:O', axis=None),
    alt.Color('image:Q',
        scale=alt.Scale(scheme=alt.SchemeParams('greys', extent=[1, 0])),
        legend=None
    ),
    alt.Facet('index:N', columns=4)
).properties(
    width=100,
    height=120
)

result: image

mattijn commented 1 year ago

Thanks @ChiaLingWeng, its still a great example how you can make this work with multidimensional array data!

Regarding this QA of Jake on SO:

Can we expect new features in altair-viz that will allow us to visualize data straight from numpy arrays without having to convert it into pandas dataframe or are we going to have to rely on matplotlib for a long time?

No, Altair's grammar is tied very closely to structured, tabular data. I don't anticipate ever supporting data specified as unlabeled multidimensional arrays.

I really hope elsewise so that we can use labeled (eg. xarray) or unlabeled (eg. numpy) multidimensional arrays as a native data source for Altair. Just like we have fields=<column-name> for labeled data sources we could have dim=<arr-dimension> in order to map unlabeled dimensions to encoding channels. Like this we could introduce support for multidimensional arrays/tensors to up to 4 dimensions (using alt.X(), alt.Y(), alt.Row(), alt.Col()). Surely this requires discussions among multiple repositories, but it would be so nice. See also https://altair-viz.github.io/about/roadmap.html#gridded-data-support.