Closed abhinavsingh closed 3 years ago
You cannot pass a selection directly as a chart property. Instead of
chart.properties(selection=occupation_filter)
try this:
chart.add_selection(occupation_filter)
You cannot pass a selection directly as a chart property. Instead of
chart.properties(selection=occupation_filter)
try this:
chart.add_selection(occupation_filter)
Thank you for prompt reply.
.properties(selection=occupation_filter)
seem to work well.import os
import pandas as pd
import numpy as np
import altair as alt
DATA_DIR_PATH = '/Users/abhinavsingh/Downloads/ml-100k'
def mask(df, key, function):
"""Returns a filtered dataframe, by applying function to key"""
return df[function(df[key])]
def flatten_cols(df):
df.columns = [' '.join(col).strip() for col in df.columns.values]
return df
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.3f}'.format
pd.DataFrame.mask = mask
pd.DataFrame.flatten_cols = flatten_cols
alt.data_transformers.enable('default', max_rows=None)
alt.renderers.enable('altair_viewer')
# Since some movies can belong to more than one genre, we create different
# 'genre' columns as follows:
# - all_genres: all the active genres of the movie.
# - genre: randomly sampled from the active genres.
def mark_genres(movies, genres):
def get_random_genre(gs):
active = [genre for genre, g in zip(genres, gs) if g == 1]
if len(active) == 0:
return 'Other'
return np.random.choice(active)
def get_all_genres(gs):
active = [genre for genre, g in zip(genres, gs) if g == 1]
if len(active) == 0:
return 'Other'
return '-'.join(active)
movies['genre'] = [
get_random_genre(gs) for gs in zip(*[movies[genre] for genre in genres])]
movies['all_genres'] = [
get_all_genres(gs) for gs in zip(*[movies[genre] for genre in genres])]
# A function that generates a histogram of filtered data.
def filtered_hist(field, label, filter):
"""Creates a layered chart of histograms.
The first layer (light gray) contains the histogram of the full data, and the
second contains the histogram of the filtered data.
Args:
field: the field for which to generate the histogram.
label: String label of the histogram.
filter: an alt.Selection object to be used to filter the data.
"""
base = alt.Chart().mark_bar().encode(
x=alt.X(field, bin=alt.Bin(maxbins=10), title=label),
y="count()",
).properties(
width=300,
)
return alt.layer(
base.transform_filter(filter),
base.encode(color=alt.value('lightgray'), opacity=alt.value(.7)),
).resolve_scale(y='independent')
def main() -> None:
# Load users
users_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv(
os.path.join(DATA_DIR_PATH, 'u.user'),
sep='|',
names=users_cols,
encoding='latin-1')
# Load ratings
ratings_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv(
os.path.join(DATA_DIR_PATH, 'u.data'),
sep='\t',
names=ratings_cols,
encoding='latin-1')
# The movies file contains a binary feature for each genre.
genre_cols = [
"genre_unknown", "Action", "Adventure", "Animation", "Children", "Comedy",
"Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror",
"Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"
]
movies_cols = [
'movie_id', 'title', 'release_date', "video_release_date", "imdb_url"
] + genre_cols
movies = pd.read_csv(
os.path.join(DATA_DIR_PATH, 'u.item'),
sep='|',
names=movies_cols,
encoding='latin-1')
# Since the ids start at 1, we shift them to start at 0.
users["user_id"] = users["user_id"].apply(lambda x: str(x-1))
ratings["movie_id"] = ratings["movie_id"].apply(lambda x: str(x-1))
ratings["user_id"] = ratings["user_id"].apply(lambda x: str(x-1))
ratings["rating"] = ratings["rating"].apply(lambda x: float(x))
movies["movie_id"] = movies["movie_id"].apply(lambda x: str(x-1))
movies["year"] = movies['release_date'].apply(
lambda x: str(x).split('-')[-1])
# Compute the number of movies to which a genre is assigned.
genre_occurences = movies[genre_cols].sum().to_dict()
# Create all_genres and genre columns
mark_genres(movies, genre_cols)
# Create one merged DataFrame containing all the movielens data.
movielens = ratings.merge(movies, on='movie_id').merge(users, on='user_id')
# The following functions are used to generate interactive Altair charts.
# We will display histograms of the data, sliced by a given attribute.
# Create filters to be used to slice the data.
occupation_filter = alt.selection_multi(fields=["occupation"])
occupation_chart = alt.Chart().mark_bar().encode(
x="count()",
y=alt.Y("occupation:N"),
color=alt.condition(
occupation_filter,
alt.Color("occupation:N", scale=alt.Scale(scheme='category20')),
alt.value("lightgray")),
).properties(width=300, height=300)
occupation_chart.add_selection(occupation_filter)
'''
users_ratings = (
ratings
.groupby('user_id', as_index=False)
.agg({'rating': ['count', 'mean']})
.flatten_cols()
.merge(users, on='user_id')
)
# Create a chart for the count, and one for the mean.
alt.hconcat(
filtered_hist('rating count', '# ratings / user', occupation_filter),
filtered_hist('rating mean', 'mean user rating', occupation_filter),
occupation_chart,
data=users_ratings)
'''
if __name__ == '__main__':
main()
I can't run your code, because your data does not exist on my system. Can you create a minimal example that reproduces the error?
Oh, I see the issue. When you run
occupation_chart.add_selection(occupation_filter)
It does not modify the chart in place, but rather returns a modified chart (this is true of all Altair Chart methods). So you should write
occupation_chart = occupation_chart.add_selection(occupation_filter)
or, better, chain the add_selection
call onto the existing chart specification, as you did with the encode()
, properties()
, and other method calls.
or, better, chain the
add_selection
call onto the existing chart specification, as you did with theencode()
,properties()
, and other method calls.
Thanks again but still running into same issue. I cannot attach relevant data files to GitHub issue but you can download the sample set from http://files.grouplens.org/datasets/movielens/ml-100k.zip (4.9 Mb)
, update DATA_DIR_PATH
to point to the extracted folder for a reproducible example.
I am also confused over why a different syntax work for colab
but fails via terminal
. Is there a difference in renderer APIs? See Google Colab
screenshot below:
^^^^ Works in Colab environment
For reference here is output of pip freeze
:
altair==4.1.0
altair-data-server==0.4.1
altair-viewer==0.3.0
appnope==0.1.0
attrs==19.3.0
autopep8==1.5.3
backcall==0.2.0
decorator==4.4.2
entrypoints==0.3
importlib-metadata==1.7.0
ipython==7.16.1
ipython-genutils==0.2.0
jedi==0.17.2
Jinja2==2.11.2
jsonschema==3.2.0
MarkupSafe==1.1.1
numpy==1.19.1
pandas==1.0.5
parso==0.7.1
pexpect==4.8.0
pickleshare==0.7.5
portpicker==1.3.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycodestyle==2.6.0
Pygments==2.6.1
pyrsistent==0.16.0
python-dateutil==2.8.1
pytz==2020.1
six==1.15.0
toml==0.10.1
toolz==0.10.0
tornado==6.0.4
traitlets==4.3.3
vega-datasets==0.8.0
wcwidth==0.2.5
zipp==3.1.0
The only reason you'd get a SchemaValidationError
in one but not the other would be if different Altair versions are installed.
Try running python -m pip freeze
instead of a simple pip freeze
to make sure you're exporting the same environment you're using with your Python interpreter.
Yep environment looks perfectly fine to me.
> python -m pip freeze
altair==4.1.0
altair-data-server==0.4.1
altair-viewer==0.3.0
appnope==0.1.0
attrs==19.3.0
autopep8==1.5.3
backcall==0.2.0
decorator==4.4.2
entrypoints==0.3
importlib-metadata==1.7.0
ipython==7.16.1
ipython-genutils==0.2.0
jedi==0.17.2
Jinja2==2.11.2
jsonschema==3.2.0
MarkupSafe==1.1.1
numpy==1.19.1
pandas==1.0.5
parso==0.7.1
pexpect==4.8.0
pickleshare==0.7.5
portpicker==1.3.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycodestyle==2.6.0
Pygments==2.6.1
pyrsistent==0.16.0
python-dateutil==2.8.1
pytz==2020.1
six==1.15.0
toml==0.10.1
toolz==0.10.0
tornado==6.0.4
traitlets==4.3.3
vega-datasets==0.8.0
wcwidth==0.2.5
zipp==3.1.0
FWIW, Google Colab installs altair
via pip install git+git://github.com/altair-viz/altair.git
. I did try the same too with same result. Outside of that only difference is between the render
:
alt.renderers.enable('colab')
alt.renderers.enable('altair_viewer')
-- One that I am using locally.Below is a screenshot of version which Google Colab installs 4.2.0.dev0
I nuked my virtual environment and followed these steps (this time installing directly from github repo) to reproduce the same result:
> python3 -m venv venv
> source venv/bin/activate
> pip install git+git://github.com/altair-viz/altair.git
> pip install git+git://github.com/altair-viz/altair_viewer.git
> .... run the script ...
raise SchemaValidationError(self, err)
altair.utils.schemapi.SchemaValidationError: Invalid specification
altair.vegalite.v4.schema.channels.Color, validating 'additionalProperties'
Additional properties are not allowed ('selection' was unexpected)
Result of freeze
on new environment:
> python -m pip freeze
altair==4.2.0.dev0
altair-data-server==0.4.1
altair-viewer==0.4.0.dev0
attrs==19.3.0
entrypoints==0.3
importlib-metadata==1.7.0
Jinja2==2.11.2
jsonschema==3.2.0
MarkupSafe==1.1.1
numpy==1.19.1
pandas==1.1.0
portpicker==1.3.1
pyrsistent==0.16.0
python-dateutil==2.8.1
pytz==2020.1
six==1.15.0
toolz==0.10.0
tornado==6.0.4
zipp==3.1.0
Ok - if you could creste a minimal reproducible example (complete code) that demonstrates the issue, that would be helpful. I’ve tried to piece one together from what you provided, but I’m unable to reproduce the error.
Ok - if you could creste a minimal reproducible example (complete code) that demonstrates the issue, that would be helpful. I’ve tried to piece one together from what you provided, but I’m unable to reproduce the error.
Does it open up charts at your end? I can surely put the entire thing into a repo for you to take a look. Will share soon.
@jakevdp To reproduce, you probably should also add occupation_chart.show()
. I kind of skipped it from the code above.
So currently, at your end script must be finishing without any visible outputs. Adding show
is when the error gets triggered. Sorry somehow missed this line in the above code.
Does it open up charts at your end?
Does what open up charts? None of your code includes the data, so I cannot run it. Try to create a short, complete snippet, with no reference to data files on your computer, that I can copy and paste into a terminal to see the error you're seeing.
@abhinavsingh I am going through Altair issues to find those that have been resolved and can be closed. Would you be able to close this issue or add a comment with a short reproducible example if there the you are still encountering this issue?
I encountered this issue while going through a colab, I am unsure if this is no longer an issue. But good to close this now.
or, better, chain the
add_selection
call onto the existing chart specification, as you did with theencode()
,properties()
, and other method calls.
For the record, this worked for me earlier today. Thanks, Jake!
Hi,
I am following a Google Colab which uses
altair
for visualization.I don't plan to use interactive notebook, so I am using
altair-viewer
for rendering via Python terminal. Unfortunately, I run into following exception:Here is my sample code (same as one found on
colab
):I have tried both
pip install altair
andpip install git+git://github.com/altair-viz/altair.git
Following example from
altair
doc seems to work fine from terminal:Please guide how to unblock myself. Apologies if this is a noob question, my first time trying
altair
.Thank you!!!