pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.26k stars 17.79k forks source link

BUG: style.map() not compatible with CSS string "url(data:..." #59623

Open invalidarg opened 2 weeks ago

invalidarg commented 2 weeks ago

Pandas version checks

Reproducible Example

import pandas as pd
print(pd.__version__) # 2.2.2

# Creating toy data
data = {
    "country": [ "Canada",  "Denmark"],
    "number": [ 200, 400]
}

def flag_background(country):
    if country == "Denmark":
        return """background-image: url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" id="flag-icons-dk" viewBox="0 0 640 480"><path fill="%23c8102e" d="M0 0h640.1v480H0z"/><path fill="%23fff" d="M205.7 0h68.6v480h-68.6z"/><path fill="%23fff" d="M0 205.7h640.1v68.6H0z"/></svg>');"""
    elif country == "Canada":
        return "background-color: red"

df = pd.DataFrame(data)
print(
    (
    df
    .style
    .map(flag_background)
    ).to_html()
)

### css in output is broken:
# #T_b3ead_row1_col0 {
#   background-image: url('data;
# }

#### But set_table_styles() works!

print(
    df
    .style
    .set_table_styles([
          {
              'selector': '.col0', 
                'props': 
                    [('background-image', '''url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" id="flag-icons-dk" viewBox="0 0 640 480"><path fill="%23c8102e" d="M0 0h640.1v480H0z"/><path fill="%23fff" d="M205.7 0h68.6v480h-68.6z"/><path fill="%23fff" d="M0 205.7h640.1v68.6H0z"/></svg>')'''), ('background-size', 'contain'),('background-repeat', 'no-repeat'),('background-position', 'center')]
         }
    ])
    .to_html()
)

### css in output is OK:
# #T_106ed .col0 {
#   background-image: url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" id="flag-icons-dk" viewBox="0 0 640 480"><path fill="%23c8102e" d="M0 0h640.1v480H0z"/><path fill="%23fff" d="M205.7 0h68.6v480h-68.6z"/><path fill="%23fff" d="M0 205.7h640.1v68.6H0z"/></svg>');
#   background-size: contain;
#   background-repeat: no-repeat;
#   background-position: center;
# }

Issue Description

I am tring to add SVG flags to each country but styler breaks css values with url(data:...

The CSS string returned by the function in style.map must be

property : value ; property2 : value ;

But there is valid CSS that does not follow this pattern. e.g. this is valid CSS: background-image: url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" id="flag-icons-dk" viewBox="0 0 640 480"><path fill="%23c8102e" d="M0 0h640.1v480H0z"/><path fill="%23fff" d="M205.7 0h68.6v480h-68.6z"/><path fill="%23fff" d="M0 205.7h640.1v68.6H0z"/></svg>');

The problem is that pandas finds two consecutive colons : is will replace the second with semicolon ; and then truncate. I.e. the resulting HTML will be

background-image: url('data;

Expected Behavior

Let me input any valid CSS string. Remove validation / truncation since it is not compatible with valid CSS strings.

Installed Versions

/databricks/python/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.10.12.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-1067-azure Version : #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.23.5 pytz : 2022.7 dateutil : 2.8.2 setuptools : 65.6.3 pip : 22.3.1 Cython : 0.29.32 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.14.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.7.0 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.10.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
Lollitor commented 2 weeks ago

Hello, this is one of my first times contributing. I noticed that it was still necessary to check if the bug existed on the main branch, so I ran the code that reproduces the bug on the main branch.

While using .map() I get the following

<style type="text/css">
#T_7aa42_row0_col0 {
  background-color: red;
}
#T_7aa42_row1_col0 {
  background-image: url('data;
}
</style>

However, I get the correct output when using set_table_styles() as reported. Therefore, I can confirm that the bug also exists on the main branch.

\pandas> git branch --show-current
main

I hope this is somehow helpful. I see if I manage to do more!

Installed Versions:

commit : ef3368a8046f3c2e98c773be179f0a49a51d4bdc python : 3.12.4 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.22631 machine : AMD64 processor : AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : Italian_Italy.1252

pandas : 0+untagged.35428.gef3368a numpy : 1.26.4 dateutil : 2.9.0.post0 pip : 24.2 Cython : 3.0.11 sphinx : 8.0.2 IPython : 8.26.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : 1.4.0 fastparquet : 2024.5.0 fsspec : 2024.6.1 html5lib : 1.1 hypothesis : 6.111.2 gcsfs : 2024.6.1 jinja2 : 3.1.4 lxml.etree : 5.3.0 matplotlib : 3.9.2 numba : 0.60.0 numexpr : 2.10.1 odfpy : None openpyxl : 3.1.5 psycopg2 : 2.9.9 pymysql : 1.4.6 pyarrow : 17.0.0 pyreadstat : 1.2.7 pytest : 8.3.2 python-calamine : None pytz : 2024.1 pyxlsb : 1.0.10 s3fs : 2024.6.1 scipy : 1.14.1 sqlalchemy : 2.0.32 tables : 3.10.1 tabulate : 0.9.0 xarray : 2024.7.0 xlrd : 2.0.1 xlsxwriter : 3.2.0 zstandard : 0.23.0 tzdata : 2024.1 qtpy : None pyqt5 : None

attack68 commented 2 weeks ago

Possible reason to revive: https://github.com/pandas-dev/pandas/pull/48869

attack68 commented 2 weeks ago

This behaviour is due to the function maybe_convert_css_to_tuples. Note the discussion that shared link.

invalidarg commented 2 weeks ago

Thanks attack68 for pointing to the correct direction. A patch to maybe_convert_css_to_tuples would help until (or instead) full fledged CSS parsing is in place?

Basically taking all remaining elements of the x.split(":")-list instead of only the second.

def maybe_convert_css_to_tuples(style: str) -> str:
    """
    Convert css-string to sequence of tuples format if needed.
    'color:red; border:1px solid black;' -> [('color', 'red'),
                                             ('border','1px solid red')]
    """
    if isinstance(style, str):
        s = style.split(";")
        try:
            return [
                (x.split(":")[0].strip(), ":".join(x.split(":")[1:]).strip()) # updated to take [1:] elements
                for x in s
                if ":".join(x.split(":")[1:]).strip() != "" # updated to take [1:] elements
            ]
        except IndexError as err:
            raise ValueError(
                "Styles supplied as string must follow CSS rule formats, "
                f"for example 'attr: val;'. '{style}' was given."
            ) from err
    return style

maybe_convert_css_to_tuples("""background-image: url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" id="flag-icons-dk" viewBox="0 0 640 480"><path fill="%23c8102e" d="M0 0h640.1v480H0z"/><path fill="%23fff" d="M205.7 0h68.6v480h-68.6z"/><path fill="%23fff" d="M0 205.7h640.1v68.6H0z"/></svg>');""")

This seems to work for examples I have tried.

attack68 commented 1 week ago

I think this is a good patch. A PR is appreciated. Then we can check all existing tests and ensure compliance.

invalidarg commented 1 week ago

The patch still won't fix cases with semi-colons in the css value, e.g.

def maybe_convert_css_to_tuples(style: str) -> str:
    """
    Convert css-string to sequence of tuples format if needed.
    'color:red; border:1px solid black;' -> [('color', 'red'),
                                                ('border','1px solid red')]
    """
    if isinstance(style, str):
        s = style.split(";")
        try:
            return [
                (x.split(":")[0].strip(), ":".join(x.split(":")[1:]).strip())
                for x in s
                if ":".join(x.split(":")[1:]).strip() != ""
            ]
        except IndexError as err:
            raise ValueError(
                "Styles supplied as string must follow CSS rule formats, "
                f"for example 'attr: val;'. '{style}' was given."
            ) from err
    return style

css_str = 'background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA8AAAAPCAQAAACR313BAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAAmJLR0QA/4ePzL8AAAAHdElNRQfoBg8UEDHh089+AAABAUlEQVQY043POyiFAQDF8d93ySNi+HRTkrgmMimRnZJkVFaPMstCSpmIPEoGIpsMNslwM1yKIhkoeXYzWA0K12P4cFdnO+ffOXUCWRVoViWQlvIaRTk/KN+EFRXyhDpNyzjO9kJHFpX++UqHJrN42zjKjFnVLzQs7klNBNucypGQNqfbipQzzBuJ8KYB7JgFgWtL6LdMDE2S4totgC9fDsAnMYEKjxLe3INaCSeocxWNPwvV+1CCmC3v8hR5UB2NX2hy6dyGQXtSAj22rbmL2kOSYkKjZrSg17yu7Otc+9YV//lCfRp/ERkdptzYl1aiWoNdST8vf1WqVbmMW6de/E/fgBNApPnOcOEAAAAldEVYdGRhdGU6Y3JlYXRlADIwMjQtMDYtMTVUMjA6MTY6NDgrMDA6MDCqRcMDAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDI0LTA2LTE1VDIwOjE2OjQ4KzAwOjAw2xh7vwAAAABJRU5ErkJggg==")'
tuple_list = maybe_convert_css_to_tuples(css_str)
print("tuple_list=",tuple_list)

The resulting tuple is truncated at the ;: tuple_list= [('background-image', 'url("data:image/png')]

Something like https://github.com/pandas-dev/pandas/pull/48869 would be needed to fix that.

attack68 commented 1 week ago

Good point, and this may well be the issue to revive it.