mckinsey / causalnex

A Python library that helps data scientists to infer causation rather than observing correlation.
http://causalnex.readthedocs.io/
Other
2.24k stars 258 forks source link

vis.show() UnicodeEncodeError #204

Closed hutaohutc closed 1 year ago

hutaohutc commented 1 year ago

I just run the tutorial code

import warnings
from causalnex.structure import StructureModel

warnings.filterwarnings("ignore")  # silence warnings

sm = StructureModel()
sm.add_edges_from([ ('health', 'absences'), ('health', 'G1') ])
from causalnex.plots import plot_structure, NODE_STYLE, EDGE_STYLE

viz = plot_structure(
    sm,
    all_node_attributes=NODE_STYLE.WEAK,
    all_edge_attributes=EDGE_STYLE.WEAK,
)
viz.show("01_simple_plot.html")

but get error: UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position 230710: illegal multibyte sequence, I have changed 2 win PCs,but get the same error

kylelim-mckinsey commented 1 year ago

Hi! this can be solved if you include the following code

import os
os.environ["PYTHONIOENCODING"] = "utf-8"

The error message UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position 230710: illegal multibyte sequence suggests that you're encountering an issue with character encoding. It appears that your Python environment is trying to use the GBK codec, which is typically used for Chinese characters, but it's encountering a character (in this case, '\xa9') that it cannot encode.

However, in your code, you're dealing with the graph visualization which shouldn't cause such error. So, it's likely this issue is environment-related rather than directly related to your code.

One workaround is to set the Python environment to use UTF-8 encoding by default. You can achieve this by setting the environment variables in Python:

hutaohutc commented 1 year ago

@kylelim-mckinsey thank you for your answer,but I use this code still get error.

UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position 230710: illegal multibyte sequence image

kylelim-mckinsey commented 1 year ago

Let's use this. (I forgot to put the hyphen. It is 'utf-8'

import os
os.environ['PYTHONIOENCODING'] = 'utf-8'

So based on the error message, you are likely experiencing an encoding issue while using the Graphviz library in Python. This is typically due to differences in default encodings across systems. As the error message suggests, the issue might be related to the 'gbk' codec not being able to encode a specific character. 'gbk' is an encoding primarily used for simplified Chinese characters.

When Python interacts with your operating system or file system, it uses the system's default encoding. In Windows, it's often a variant of 'gbk'. It seems like the function you're using (which indirectly relies on Graphviz) is trying to write a character not compatible with the 'gbk' encoding.

If that doesn't work, another workaround might be to use the pygraphviz library directly to generate the graph, since this library has a draw method.

hutaohutc commented 1 year ago

I am sorry ,I still get the same error.. 1684503060853

GabrielAzevedoFerreiraQB commented 1 year ago

I see. Thanks for the issue, Tau. It seems the setting of encoding is somehow not taking effect.

This is a pyvis issue which we reported and which seems to be related to your specific Windows encoding. This is a common issue and we raised a PR to fix on next releases (https://github.com/WestHealth/pyvis/pull/231)

In the meantime we suggest:

# define width, height and name for file
name = "index.html"
height="600px"
width="100%"

# generate html
html = viz.generate_html(notebook=notebook)

# write html
with open(name, "w+", encoding="utf-8") as out:
    out.write(html)

#Display as an IFrame
IFrame(name, width=width, height=height)

I will close the issue, but let us know if you encounter any issues. I am happy to reopen it again

wxq0309 commented 1 year ago

image image I just ran the simplest demo, how can I solve this problem?

GabrielAzevedoFerreiraQB commented 1 year ago

Could you try the instructions above, please?

as explained above this is an issue with PyVis (a different package) in your specific setup (message above has more details)

to get around the issue, you should not use viz.show() but instead use the longer command above.

This is because pyvis unfortunately expects an utf-8 encoding, which is not your set up seems to be.

hutaohutc commented 1 year ago

@GabrielAzevedoFerreiraQB It's worked! Thanks . image

GabrielAzevedoFerreiraQB commented 1 year ago

Awesome news. thanks!