whitphx / stlite

In-browser Streamlit 🎈🚀
https://edit.share.stlite.net
Apache License 2.0
1.18k stars 59 forks source link

Dataframe columns of lists of strings render the strings as ASCII numbers instead of strings #884

Open shrianshChari opened 5 months ago

shrianshChari commented 5 months ago

When I attempt to render a dataframe that contains a column that is a list of strings (in this case my dataset is called spl and the column is titled team), the strings will be displayed correctly in Streamlit (run locally on my machine) but not when it gets translated using stlite (run using stlite sharing):

spl['team'].iloc[0:5]

Output in Streamlit:

image

Output in stlite:

image

When I take the first row of the output from stlite and convert each number into its corresponding ASCII character value,

>>> s = '91,34,83,110,111,114,108,97,120,34,44,34,71,111,108,101,109,34,44,34,71,101,110,103,97,114,34,44,34,90,97,112,100,111,115,34,44,34,70,111,114,114,101,116,114,101,115,115,34,44,34,83,116,97,114,109,105,101,34,93'
>>> s = s.split(',')
>>> c = list(map(lambda x: chr(int(x)), s))
>>> ''.join(c)
'["Snorlax","Golem","Gengar","Zapdos","Forretress","Starmie"]'

It seems that it is able to recognize that spl['team'] is a column that contains a list of strings, as when I run:

spl['team'].iloc[0]

I get the same output for both Streamlit and stlite: image

whitphx commented 5 months ago

Thank you for reporting this!

whitphx commented 3 months ago

df.to_parquet() here is done without any error. https://github.com/whitphx/streamlit/blob/stlite-1.35.0/lib/streamlit/type_util.py#L1131

Maybe the problem is from fastparquet and/or parquet-wasm? -> Looks like fastparquet sets the column metadata "pandas_type": "mixed" in this case where pyarrow does "pandas_type": "list[unicode]". The code is here? https://github.com/dask/fastparquet/blob/1891a4a55fbe2ac23b29c064258c9c2eba480d28/fastparquet/util.py#L407-L408


Sample code:

import streamlit as st
import pandas as pd

df = pd.DataFrame({
    "names": [["foo", "bar"], ["baz", "quz"]]
})

st.dataframe(df)