Closed PhiLhoSoft closed 3 years ago
Hi Philippe Many thanks. It does help. I will try to address the error, most likely using your fix :-) Cheers Rafa
No problem using my fix, you are welcome.
Other remarks seen after import:
<en-note>{enex_content}</en-note>
otherwise the indenting at the start of the content makes the first line to be in fixed-width rendering.Update: I reported the last issue to Joplin: https://github.com/laurent22/joplin/issues/3578 I have a workaround. Here is the file I modified for my needs. simplenote2enex.py.txt
Not sure of the CR CR LF issue, but it is only cosmetic, so I will leave it as is.
Oh, and thank you for this useful tool! I would hesitate to change otherwise.
Hi Philippe
Can you send me a simple json file with some of the characters that caused the choking?
For starters, I have added to the repository a file test.issue.001.json with the japanese and special characters you mentioned above. You may want to use that as a base.
Many thanks
Hi Philippe, I ran simplenote2enex.py (same as in repository) against file test.issue.001.json and it converted it OK. I paste the output below and attach the xml output as a file. My environment: Kubuntu 19.10, python 3.7.5. I will try in a windows system later.
python simplenote2enex.py --json-file ./test.issue.001.json --author Rafa --create-title --verbose-level 1 > /tmp/output.txt
Processing file: ./test.issue.001.json
Notes author: Rafa
Active notes: 3
Trashed notes: 0 -- will not be converted to ENEX
Converted 3 notes
Hi @rpgd60,
FYI I have a considerable amount of notes in Japanese language. So far I didn't encounter an error when processing some notes through this tool (latest version as of writing this comment). In case an error pops up, I'll try to let you know.
My environment is MacOS Catalina v10.15.6, Python 3.8.5.
Thanks.
Many thanks for the feedback. Glad to hear that.
Sorry for the late answer, I was in vacations lately. I will try and make a simplified Json with some problematic characters. Note that the emoji you pasted is a reference to an image, not the U+1F601 character. And maybe the issue is with the Windows command line terminal, which is not the best when it comes to Unicode character handling, since your script works fine in Linux and MacOS. I believe the patched script I attached above (did you look at it?) should work everywhere, fixing Windows issue (I hope).
Hi Philippe Thank you very much for the inputs. I'd really appreciate if you can send me a representative json file for testing. I'd rather not implement the workaround until I can test it in a failed case. Cheers
Yes, sorry, I still have to do that… But I guess the problem is Windows specific. I hope you can test in this environment.
OK, I did it. I extracted some notes (removing content to make them shorter) with some significant samples: one with a bunch of emoticons, one with the special characters, one with a table (that was correctly imported in Joplin) and one with Japanese characters. Most text is in French, but it is not relevant anyway.
SimplenotesExportSample.json.txt
With current code:
> py simplenote2enex.py --json-file SimplenotesExportSample.json --author 'PhiLhoSoft' --create-title --verbose-level 1 > test1.enex
Traceback (most recent call last):
File "simplenote2enex.py", line 367, in <module>
main(args)
File "simplenote2enex.py", line 340, in main
enex_file = sne.process_file()
File "simplenote2enex.py", line 262, in process_file
simplenotes = json.load(jfp)
File "C:\Languages\Python38\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Languages\Python38\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 212: character maps to <undefined>
With my fixes:
> py simplenote2enex-pl.py --json-file SimplenotesExportSample.json --author 'PhiLhoSoft' --create-title --verbose-level 1 > test1.enex
Processing file: SimplenotesExportSample.json
Notes author: 'PhiLhoSoft'
Active notes: 4
Trashed notes: 0 -- will not be converted to ENEX
Converted 4 notes
I try to convert my medium-sized Simplenote backup file (96 entries, 430 KB), using Python 3.8.5 on Windows 10, but the decoder choked on some characters, probably from supplementary planes (beyond the BMP), with messages like:
I started to replace these characters, locating the place in a hex editor, finding back the text in a text editor displaying correctly the emoji like 😁 (some others are OK), the Japanese chars like 育美 強針 or the special chars like 𝑬𝑵 or 𝑭𝑹, and changing these. But it was slow…
So I searched the
json.load
documentation, but it said it used UTF-8 by default, contradicted by the cp1252 information above… So I searched issues about Python choking on these characters, and I found a trick, changed as the following line:with codecs.open(self.json_file, 'r', 'utf-8') as jfp:
(instead ofwith open(self.json) as jfp:
) (Need to addimport codecs
at the start.)It worked, but I had a different error:
Fortunately, I found earlier another solution, transcoding the standard output:
(just after the imports)
And it worked! I don't know if these are the best way to solve the issue (I am not a Python coder), but it worked for me.
HTH.