marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
6.58k stars 225 forks source link

Converting a Jupyter notebook to Marimo notebook leads to a broken Marimo notebook (probably because of encoding) #897

Open scls19fr opened 7 months ago

scls19fr commented 7 months ago

Describe the bug

Hello,

I have a Jupyter notebook like https://gist.github.com/scls19fr/c41394c47cbc0250263ea3f5a1de33ea

After converting it to Marimo format, Marimo is unable to open it.

This notebook contains some accents (sorry I'm french 😄 )

I'm using PowerShell under Windows 10.

I noticed that .ipynb encoding is utf-8 but generated Marimo file is utf-16 LE.

I don't think it's a good idea to rely on PowerShell std output redirect to achieve such conversion.

Converting a notebook should preserve initial encoding

A workaround may be https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8 ... but that's just a workaround

Kind regards

Environment

marimo env
{
  "marimo": "0.2.13",
  "OS": "Windows",
  "OS Version": "10",
  "Processor": "Intel64 Family 6 Model 58 Stepping 9, GenuineIntel",
  "Python Version": "3.11.8",
  "Binaries": {
    "Chrome": "122.0.6261.95",
    "Node": "v21.1.0"
  },
  "Requirements": {
    "black": "24.2.0",
    "click": "8.1.7",
    "jedi": "0.19.1",
    "pymdown-extensions": "10.7",
    "starlette": "0.37.1",
    "tomlkit": "0.12.3",
    "typing_extensions": "4.10.0",
    "uvicorn": "0.27.1"
  }
}

Code to reproduce

marimo.exe convert .\1_correction_ascenseur.ipynb > .\1_correction_ascenseur.py
marimo.exe edit .\1_correction_ascenseur.py

raises

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Here is full Traceback

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\scell\anaconda3\Scripts\marimo.exe\__main__.py", line 7, in <module>
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\marimo\_cli\cli.py", line 208, in edit
    codegen.get_app(name)
  File "C:\Users\scell\anaconda3\Lib\site-packages\marimo\_ast\codegen.py", line 207, in get_app
    contents = f.read().strip()
               ^^^^^^^^
  File "<frozen codecs>", line 322, in decode

Moreover accentued letters are changed

DonnÚes instead of Données

scls19fr commented 7 months ago

Similar problem also occurs when trying to convert Jupyter notebook to Marimo notebook using Cygwin.

scell@DESKTOP /cygdrive/c/Users/scell/Downloads
$ /cygdrive/c/Users/scell/anaconda3/Scripts/marimo.exe convert 1_correction_ascenseur.ipynb > 1_correction_ascenseur.py

scell@DESKTOP /cygdrive/c/Users/scell/Downloads
$ /cygdrive/c/Users/scell/anaconda3/Scripts/marimo.exe edit 1_correction_ascenseur.py
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\scell\anaconda3\Scripts\marimo.exe\__main__.py", line 7, in <module>
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\scell\anaconda3\Lib\site-packages\marimo\_cli\cli.py", line 208, in edit
    codegen.get_app(name)
  File "C:\Users\scell\anaconda3\Lib\site-packages\marimo\_ast\codegen.py", line 207, in get_app
    contents = f.read().strip()
               ^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 407: invalid continuation byte

but this time generated Marimo notebook have UTF8 encoding and accentued letter are also changed (but differently)

Donn�es instead of Données

akshayka commented 7 months ago

Thanks for reporting this issue. I designed marimo convert to work like a typical unix-like command line tool, hence printing to standard output.

Didn't anticipate the issues this would cause on Windows (sorry!).

I know it's just a workaround, but does this from the linked-to stack overflow post work?

It can be done on a case-by-case basis by replacing the >foo.txt syntax with | out-file foo.txt -encoding utf8

akshayka commented 7 months ago

In the meantime, if you're blocked on this not working on Windows you could try our web UI: https://marimo.io/convert

scls19fr commented 7 months ago

| out-file foo.txt -encoding utf8

converts to UTF-8 with BOM

marimo is able to edit converted file but comments are not correctly accentuated

I have DonnÚes instead of Données

So it's an half solution

Thanks about pointing to https://marimo.io/convert which works fine but can't be automated as a script can be.