olivierkes / manuskript

A open-source tool for writers
http://www.theologeek.ch/manuskript
GNU General Public License v3.0
1.78k stars 236 forks source link

Missing non-ASCII characters in file/folder name when saving in multiple files #455

Open lingsamuel opened 5 years ago

lingsamuel commented 5 years ago

image

Manuskript Version: Newest develop branch.

gedakc commented 5 years ago

Are you able to create file and folder names with Chinese characters directly in the operating system?

gagarinds commented 5 years ago

the same problem with the Russian language 2019-04-09_14-12-16

TheJackiMonster commented 3 years ago

I think this is a Windows specific issue. At least I can create directories with Kanji on my system.

lingsamuel commented 3 years ago

Nope. I tested in Arch Linux:

❯ uname -a
Linux sarasaarch 5.10.16-arch1-1 #1 SMP PREEMPT Sat, 13 Feb 2021 20:50:18 +0000 x86_64 GNU/Linux

image

lingsamuel commented 3 years ago

https://github.com/olivierkes/manuskript/blob/12defa8fa47b4faf589d44fbca0b0fc272185bb2/manuskript/load_save/version_1.py#L76-L91

lingsamuel commented 3 years ago

:thinking: how can you save with kanji? According to the code posted above, saved file name can only be ASCII and digits and underline.

lingsamuel commented 3 years ago

This function sounds a little unreasonable. It should escape forbidden characters like / on Linux and keep others as is. At least on Linux, filename is stored in bytes, even if the filesystem doesn't support UTF-8. The only constraint is slash and NUL character.

TheJackiMonster commented 3 years ago

@lingsamuel I tried creating a directory with Kanji from terminal which works. I didn't check the code first for a replacement. I partially agree that it shouldn't escape all characters except ASCII. But we have to check first if that is still compatible with the entries in a .zip file which manuskript uses for the single-file-mode. Besides we should still esacpe some more characters than related to the used operating system to ensure moving a project between operating systems leads to a crash or data loss.

lingsamuel commented 3 years ago

Yes, forbidden characters should be a union of all platforms.

lingsamuel commented 3 years ago

What do you mean "creating a directory with Kanji from terminal"? I think we are talking about save a manuskript project which contains non ASCII characters in filename and save it in multi-file mode, not terminal.

TheJackiMonster commented 3 years ago

I was testing what gedakc mentioned.