kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
91 stars 84 forks source link

plotly.JSONDataset not saved as utf-8 #741

Open Madnex opened 3 months ago

Madnex commented 3 months ago

Description

It can happen that the saved plotly.JSONDataset is not encoded as utf-8. Supplying the file system args as follows fixes the issue:

myfile:
  filepath: myfilepath
  fs_args:
    open_args_save:
      encoding: utf-8

However, that should be the default behaviour. The question is why there is a problem when the encoding is not explicitly set here.

Context

I had an issue with the encoding of saved plotly plots (as json) via the kedro data catalog. After saving the plots I could not read the plots anymore via the catalog. It failed with the error 'utf-8' codec can't decode byte 0xe8 in position 6570: invalid continuation byte. Investigating that further, I managed to read those files with a different encoding (e.g. latin-1). I did not understand though why the files are not valid utf-8 in the first place. Adding that fs_args mentioned above solved the issues.

Steps to Reproduce

  1. Save some plotly plot as plotly.JSONDataset with special characters.
  2. Try to load that plot via the catalog again.
  3. Eventually there should occur the error mentioned above.
  4. Change the fs_args as indicated and repeat steps 1 and 2. Now it should work without issues.

Expected Result

There should not be any encoding issues happening, because it is expected that files are saved as utf-8.

Actual Result

The file was not saved in utf-8.

utf-8' codec can't decode byte 0xe8 in position 6570: invalid continuation byte

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

merelcht commented 2 months ago

Hi @Madnex, thanks for flagging this! I can see the plotly.JSONDataset does use utf-8 for loading the dataset, but not the saving. This does indeed seem strange. We'd be more than happy to accept a PR for this!

(cc @rashidakanchwala just double checking if adding utf-8 as the default save encoding would be okay for viz?)