Closed kpym closed 4 years ago
Hi, I'm unable to reproduce this bug. Could you share with me this source file?
Strange that you can't reproduce it. I just transformed my comment to single line utf8 python file
# Les deux courbes sont très proches
and after converting it with p2j
I obtain:
{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["Les deux courbes sont tr\u00c3\u00a8s proches"]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4"}}, "nbformat": 4, "nbformat_minor": 2}
where you can see that très
is converted to tr\u00c3\u00a8s
.
Could you please attach the file on this page so that I can try it?
The content of my files is already in my previous comment. But ok, here is the one line .py
file available on ghostbin.
I still can't reproduce the error. What version of Python are you on? And what version of p2j
?
I'm using Windows 10, with Python 3.7.7. Knowing the version of p2j
is not easy, there is no -v
flag and -h
flag do not print the version.
First, I installed the last version with pip install p2j
: the behaviour with the produced p2j.exe
is what I described.
Second, I cloned your repo and run python p2j.py test.py
. Same result with \u00c3\u00a8
in place of è
.
Probably you use somewhere the system encoding, and probably it is not the same in Windows and in your OS.
I see. Could you print out the following in a Python interpreter?
import sys
sys.getdefaultencoding()
and
print("è".encode("utf-8").decode())
The answer is (in GitBash
and in PowerShell
) :
utf-8
è
I change the default encoding in my terminals to utf-8
.
I'm unable to find a solution to this bug. We'll leave it open for now. In the meantime, if you find a fix, do submit a PR.
Thanks for considering this. If I have time, I'll take a look at it.
Hi @ktzanev that's a good point! @kpym I've merged these changes into master. Can you try again and see if the problem persists?
EDIT: I made a previous comment with a bad account, so I put it back here for historical reasons.
I've checked. You should open the source .py
as utf-8 on line 32
with open(source_filename, 'r', encoding='utf-8') as infile:
And you should dump the json as utf-8 on lines 161-162 :
with open(target_filename, 'w', encoding='utf-8') as outfile:
json.dump(final, outfile, indent=1, ensure_ascii=False)
The default encoding is platform dependent, this is why it was working on some systems and not on others. It is always a good practice to open text files with specified encoding.
Great :) Closing this issue.
There is an encoding problem. This UTF8 encoded source file:
produce the markdown cell
And the resulting
.ipynb
is in UTF8 ... but somewhere the encoding was messed up.