vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.39k stars 795 forks source link

Error encountered: dictionary changed size during iteration #3554

Closed gaspardc-met closed 1 month ago

gaspardc-met commented 3 months ago

What happened?

When trying to create an altair chart within a streamlit application, I run into Uncaught Exception: dictionary changed size during iteration. Running altair 5.4.0 here, and downgrading to 5.3.0 seemed to solve this specific issue at the moment.

This is with streamlit caching removed: Error stack:

Traceback (most recent call last):
  File "/path/to/project/decorators.py", line 68, in wrapper
    result = main_func(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/file.py", line 81, in main
    display_tab(
  File "/path/to/project/file.py", line 224, in display_tab
    display_chronograms(
  File "/path/to/project/predictions/plots.py", line 513, in display_chronograms
    plot_chronogram(
  File "/path/to/project/predictions/plots.py", line 328, in plot_chronogram
    .encode(
        ^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/vegalite/v5/schema/channels.py", line 31233, in encode
    kwargs = _infer_encoding_types(args, kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 964, in infer_encoding_types
    return cache.infer_encoding_types(kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 870, in infer_encoding_types
    return {
  File "/path/to/venv/lib/python3.11/site-packages/altair/utils/core.py", line 870, in <dictcomp>
    return {
RuntimeError: dictionary changed size during iteration

I have had this error intermittently in the past, only on cloud deployments, and with altair 5.2.0. Since at the time it was intermittent and in production I attributed it at the time to a streamlit cache issue (https://github.com/streamlit/streamlit/issues/8409) Now this error is happening locally, even when I remove any caching mechanism, and it's not intermittent, it never works for that specific plot. Downgrading to altair 5.3.0 seemed to fix the issue on this specific plot for now.

I'm working on a minimal reproduction code example, but at the moment it fails to produce this error on my side with dummy data.

What would you like to happen instead?

No response

Which version of Altair are you using?

5.4.0

gaspardc-met commented 3 months ago

Still cannot manage to reproduce the error with dummy data, but these are basically the plot operations:

numpy==2.1.0 Pandas==2.2.2 streamlit==1.37.0

import pandas as pd
import altair as alt
import streamlit as st
import functools
import numpy as np
import random

# Dummy data for the chronogram
data = pd.DataFrame(
    {
        "start_time": pd.date_range("2023-01-01", periods=100, freq="H").tz_localize("Europe/Paris"),
        "end_time": pd.date_range("2023-01-01 01:00", periods=100, freq="H").tz_localize("Europe/Paris"),
        "asset": ["Asset " + str(i % 5) for i in range(100)],
        "load": [float(np.random.uniform(40, 100)) for _ in range(100)],
    }
)

# Set most 'load' values for 'Asset 0' to NaN or None
asset_0_indices = data[data["asset"] == "Asset 0"].index
indices_to_nullify = random.sample(list(asset_0_indices), k=len(asset_0_indices) - 2)  # Keep only 2 non-NaN

data.loc[indices_to_nullify, "load"] = np.nan  # Set
data = data.reset_index()

# Placeholder functions for processing and legend (you would replace these with actual logic)
def chronogram_legend(target, pump_toggle):
    return "Legend", "Short Legend", "Other Info"

def chronogram_processing(chronogram, timedelta, filter_load):
    return chronogram  # Simply returns the input data in this dummy example

def custom_blues():
    return ["low", "medium", "high"], ["#dceefb", "#86c7f3", "#1f77b4"]

def get_assets_starts_and_stops(chronogram, timedelta, separator_dt):
    # Simple dummy start/stop markers within the range
    starts_and_stops = alt.Chart(chronogram).mark_rule(color="red").encode(x="start_time:T")
    starts_and_stops_texts = (
        alt.Chart(chronogram)
        .mark_text(align="left", dx=5, dy=-5, color="red")
        .encode(x="start_time:T", text=alt.value("Start/Stop"))
    )
    return starts_and_stops, starts_and_stops_texts

def get_vertical_separator(separator_dt, labels_y, y_field):
    return None, None  # Placeholder for the actual function output

# Main function with dummy data and simplified inputs
def plot_chronogram(
    data: pd.DataFrame,
    formatted=".0f",
    target="load",
    timedelta="60min",
    filter_load=True,
    expand: bool = False,
    pump_toggle: bool = False,
    display_starts_and_stops: bool = False,
    separator_dt: pd.Timestamp = None,
):
    # Get legend information
    legend, short_legend, _ = chronogram_legend(target=target, pump_toggle=pump_toggle)

    # Process the data (dummy in this case)
    st.write(data.dtypes)

    # Example of expanding the time (dummy logic here)
    if expand:
        data = data.set_index("start_time").sort_index().reset_index()
        data.loc[28:, "end_time"] = data.loc[28:, "end_time"] + pd.Timedelta("45T")

    # Set up the color scale (dummy logic here)
    if target == "pressure":
        bins, colors = custom_blues()
        scale = alt.Scale(domain=bins, range=colors, type="ordinal")
    elif target == "load":
        scale = alt.Scale(domain=[0, 50, 100], range=["#f7fbff", "#6baed6", "#08306b"], type="threshold")
    else:
        scale = alt.Scale(scheme="blues")

    # Define the sorting order for the y-axis
    sort_order = [""] + data["asset"].sort_values().unique().tolist()

    # Main bar chart
    chart = (
        alt.Chart(data)
        .mark_bar()
        .encode(
            x=alt.X("start_time:T", title="Horizon Temporel"),
            x2=alt.X2("end_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=sort_order),
            color=alt.Color(
                "load:Q",
                title=short_legend,
                scale=scale,
                legend=alt.Legend(title=legend),
            ),
            stroke=alt.value("white"),
            strokeWidth=alt.value(2),
            tooltip=[
                alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                alt.Tooltip("load:Q", format=formatted, title=legend),
            ],
        )
    ).properties(
        title=f"Chronogramme d'opération: {legend}",
        width=1100,
        height=350,
    )

    # Text overlay layer
    text = (
        alt.Chart(data)
        .mark_text(dx=0, dy=0, color="white", fontSize=25)
        .encode(
            x=alt.X("mid_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
            text=alt.Text("load:Q", format=formatted),
            tooltip=[
                alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"),
                alt.Tooltip("start_time:T", format="%H:%M", title="Heure"),
                alt.Tooltip("load:Q", format=formatted, title=legend),
            ],
        )
    ).transform_calculate(mid_time="datum.start_time + (datum.end_time - datum.start_time)/2")

    # Additional text layer for a specific condition
    text_hot = (
        alt.Chart(data)
        .mark_text(dx=0, dy=0, color="white", fontSize=25)
        .encode(
            x=alt.X("mid_time:T", title=""),
            y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None),
            text=alt.value("Chaud"),
            tooltip=alt.value(None),
        )
    ).transform_calculate(mid_time="datum.start_time + (datum.end_time - datum.start_time)/2")

    # Store all charts to be layered
    all_charts = [chart, text, text_hot]

    # Example of including start and stop markers (dummy logic here)
    if display_starts_and_stops:
        starts_and_stops, starts_and_stops_texts = get_assets_starts_and_stops(
            chronogram=data,
            timedelta=timedelta,
            separator_dt=separator_dt,
        )
        all_charts += [starts_and_stops, starts_and_stops_texts]

    # Example of adding a vertical separator (dummy logic here)
    display_separator = separator_dt is not None and separator_dt > data.index.min()
    if display_separator:
        separator, separator_labels = get_vertical_separator(separator_dt=separator_dt, labels_y="", y_field="asset")
        all_charts += [separator, separator_labels]

    # Combine all chart layers
    composed = (
        functools.reduce(lambda a, b: a + b, all_charts)
        .configure_legend(orient="right", titleOrient="right")
        .configure_axis(labelFontSize=15, titleFontSize=15)
    )

    # Display the composed chart in Streamlit
    st.altair_chart(altair_chart=composed, use_container_width=True)

# Test the function with dummy data
plot_chronogram(data=data, display_starts_and_stops=True)
dangotbanned commented 3 months ago

Appreciate the detail here @gaspardc-met in https://github.com/vega/altair/issues/3554#issuecomment-2304319269, but a minimal repro would be helpful.

Uncaught Exception: dictionary changed size during iteration is being raised within a (LayerChart|Chart).encode(). No idea which one though, as there seem to be a few.

I copied your code directly, commenting out the streamlit parts and didn't get any errors. I added some additional checks at the end, all seem to be working as expected.

Attempted Repro without streamlit ```py def test_infer_encoding_types_mod_iter() -> None: import pandas as pd # noqa: I001 import altair as alt # import streamlit as st import functools import numpy as np import random # Dummy data for the chronogram data = pd.DataFrame( { "start_time": pd.date_range( "2023-01-01", periods=100, freq="H" ).tz_localize("Europe/Paris"), "end_time": pd.date_range( "2023-01-01 01:00", periods=100, freq="H" ).tz_localize("Europe/Paris"), "asset": ["Asset " + str(i % 5) for i in range(100)], "load": [float(np.random.uniform(40, 100)) for _ in range(100)], # noqa: NPY002 } ) # Set most 'load' values for 'Asset 0' to NaN or None asset_0_indices = data[data["asset"] == "Asset 0"].index indices_to_nullify = random.sample( list(asset_0_indices), k=len(asset_0_indices) - 2 ) # Keep only 2 non-NaN data.loc[indices_to_nullify, "load"] = np.nan # Set data = data.reset_index() # Placeholder functions for processing and legend (you would replace these with actual logic) def chronogram_legend(target, pump_toggle): return "Legend", "Short Legend", "Other Info" def chronogram_processing(chronogram, timedelta, filter_load): return chronogram # Simply returns the input data in this dummy example def custom_blues(): return ["low", "medium", "high"], ["#dceefb", "#86c7f3", "#1f77b4"] def get_assets_starts_and_stops(chronogram, timedelta, separator_dt): # Simple dummy start/stop markers within the range starts_and_stops = ( alt.Chart(chronogram).mark_rule(color="red").encode(x="start_time:T") ) starts_and_stops_texts = ( alt.Chart(chronogram) .mark_text(align="left", dx=5, dy=-5, color="red") .encode(x="start_time:T", text=alt.value("Start/Stop")) ) return starts_and_stops, starts_and_stops_texts def get_vertical_separator(separator_dt, labels_y, y_field): return None, None # Placeholder for the actual function output # Main function with dummy data and simplified inputs def plot_chronogram( data: pd.DataFrame, formatted=".0f", target="load", timedelta="60min", filter_load=True, expand: bool = False, pump_toggle: bool = False, display_starts_and_stops: bool = False, separator_dt: pd.Timestamp = None, ): # Get legend information legend, short_legend, _ = chronogram_legend( target=target, pump_toggle=pump_toggle ) # Process the data (dummy in this case) # st.write(data.dtypes) # Example of expanding the time (dummy logic here) if expand: data = data.set_index("start_time").sort_index().reset_index() data.loc[28:, "end_time"] = data.loc[28:, "end_time"] + pd.Timedelta("45T") # Set up the color scale (dummy logic here) if target == "pressure": bins, colors = custom_blues() scale = alt.Scale(domain=bins, range=colors, type="ordinal") elif target == "load": scale = alt.Scale( domain=[0, 50, 100], range=["#f7fbff", "#6baed6", "#08306b"], type="threshold", ) else: scale = alt.Scale(scheme="blues") # Define the sorting order for the y-axis sort_order = [""] + data["asset"].sort_values().unique().tolist() # noqa: RUF005 # Main bar chart chart = ( alt.Chart(data) .mark_bar() .encode( x=alt.X("start_time:T", title="Horizon Temporel"), x2=alt.X2("end_time:T", title=""), y=alt.Y( "asset:N", title="Utilisation: Groupes ou AFC", sort=sort_order ), color=alt.Color( "load:Q", title=short_legend, scale=scale, legend=alt.Legend(title=legend), ), stroke=alt.value("white"), strokeWidth=alt.value(2), tooltip=[ alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"), alt.Tooltip("start_time:T", format="%H:%M", title="Heure"), alt.Tooltip("load:Q", format=formatted, title=legend), ], ) ).properties( title=f"Chronogramme d'opération: {legend}", width=1100, height=350, ) # Text overlay layer text = ( alt.Chart(data) .mark_text(dx=0, dy=0, color="white", fontSize=25) .encode( x=alt.X("mid_time:T", title=""), y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None), text=alt.Text("load:Q", format=formatted), tooltip=[ alt.Tooltip("start_time:T", format="%Y-%m-%d", title="Date"), alt.Tooltip("start_time:T", format="%H:%M", title="Heure"), alt.Tooltip("load:Q", format=formatted, title=legend), ], ) ).transform_calculate( mid_time="datum.start_time + (datum.end_time - datum.start_time)/2" ) # Additional text layer for a specific condition text_hot = ( alt.Chart(data) .mark_text(dx=0, dy=0, color="white", fontSize=25) .encode( x=alt.X("mid_time:T", title=""), y=alt.Y("asset:N", title="Utilisation: Groupes ou AFC", sort=None), text=alt.value("Chaud"), tooltip=alt.value(None), ) ).transform_calculate( mid_time="datum.start_time + (datum.end_time - datum.start_time)/2" ) # Store all charts to be layered all_charts = [chart, text, text_hot] # Example of including start and stop markers (dummy logic here) if display_starts_and_stops: starts_and_stops, starts_and_stops_texts = get_assets_starts_and_stops( chronogram=data, timedelta=timedelta, separator_dt=separator_dt, ) all_charts += [starts_and_stops, starts_and_stops_texts] # Example of adding a vertical separator (dummy logic here) display_separator = separator_dt is not None and separator_dt > data.index.min() if display_separator: separator, separator_labels = get_vertical_separator( separator_dt=separator_dt, labels_y="", y_field="asset" ) all_charts += [separator, separator_labels] # Combine all chart layers composed = ( functools.reduce(lambda a, b: a + b, all_charts) # noqa: FURB118 .configure_legend(orient="right", titleOrient="right") .configure_axis(labelFontSize=15, titleFontSize=15) ) # Display the composed chart in Streamlit # st.altair_chart(altair_chart=composed, use_container_width=True) return composed # Test the function with dummy data composed = plot_chronogram(data=data, display_starts_and_stops=True) # NOTE: Reaching here wouldn't be possible if the error raised validated = composed.to_dict(validate=True) # NOTE: Another error would have been raised if the spec we returned was not valid assert isinstance(validated, dict) # NOTE: These may require optional dependencies you don't have, # but provide more evidence of the spec produced being valid vega_editor_url = composed.to_url() assert isinstance(vega_editor_url, str) composed.open_editor() ```

If you do not have the required dependencies for Chart.open_editor, see in Vega Editor

Screenshot of Vega Editor

![image](https://github.com/user-attachments/assets/52e4782d-2690-4f6b-a1ac-9286476b9986)

Edit

A possible issue here is https://github.com/streamlit/streamlit/releases/tag/1.37.0 was released prior to https://github.com/vega/altair/releases/tag/v5.4.0. streamlit may be making assumptions on the internals of altair, which do not hold since https://github.com/vega/altair/pull/3444

I'm not familiar with streamlit, but it may be one of these modules that is altering the dictionary:

dangotbanned commented 2 months ago

Closing as it appears to be a downstream issue in streamlit.

@gaspardc-met please feel free to comment if you feel I've made a mistake in this assessment