orbitalquark / textadept

Textadept is a fast, minimalist, and remarkably extensible cross-platform text editor for programmers.
https://orbitalquark.github.io/textadept
MIT License
640 stars 38 forks source link

[WIN32] cannot read filenames with Greek chars #367

Closed ousia closed 1 year ago

ousia commented 1 year ago

@orbitalquark,

on Windows, I may have a file in a path such as the following one:

%USERPROFILE%\12345\καλός\κόσμος.md

Textadept 11.4 cannot open it, but Notepad++ and Geany both open it fine.

I wonder whether there is something I can do to avoid renaming files and directories.

Many thanks for your help and your excellent work with TA.

mhwombat commented 1 year ago

I'm not on Windows, so I probably can't help much. But the first question I have is whether it's the %USERPROFILE% or the non-ASCII characters that trigger the problem.

Can you try invoking textadept with a path to the filename that does not include the %USERPROFILE% variable, and let us know if that works? For example,

 textadept c:\Users\MyUsername\12345\καλός\κόσμος.md

Can you try temporarily creating a file with an an ASCII filename such as abc.md somewhere under the %USERPROFILE% directory, and see if you can open that? For example,

textadept %USERPROFILE%\abc.md
snoopy commented 1 year ago

Opening files with %USERPROFILE% works fine. Opening a file named κόσμος is not working.

When opened through the open-dialog the filename appears as ??s�??. When opened directly via the io.open_file function the name appears as κόσμος.

ousia commented 1 year ago

Sorry for not describing more accurately.

%USERPROFILE% isn’t a problem at all.

The problem is the Greek chars in the path/and or filename.

From a file located at C:\crap\καλός\κόσμος.md, I get the following error message in the message buffer:

lua: C:\tools\textadept/core/ui.lua:156: conversion failed

The same version of Textadept in Linux has no problem at all with Greek chars in the file name and/or path.

orbitalquark commented 1 year ago

Thanks for the report. Textadept 11.4 on Windows uses the GTK GUI toolkit to determine the filesystem's character encoding. Textadept 12.0 is moving to the Qt toolkit on Windows and will use a different method. May I ask you to try one of the alpha releases or a nightly build just to see if it still fails? You might want to temporarily move your ~/.textadept/ directory when trying an alpha build to avoid any compatibility errors.

ousia commented 1 year ago

Many thanks for your rely, @orbitalquark.

Using the Textadept nightly (07 Mar 2023), I’m afraid I only get question marks, π, σ and others converted to p, s (and such). Only μ is respected in the file name (and directory).

orbitalquark commented 1 year ago

Sorry for the delayed response. I am just getting around to looking into this. Would you please open the Command Entry (Tools > Command Entry), type _CHARSET, and press Enter? Textadept should print your filesystem's character encoding to the print buffer. I would like this value for a >= 12.0 alpha build (your nightly from 07 Mar 2023 would be fine). Thanks!

ousia commented 1 year ago

Sorry for the delay, @orbitalquark.

CP1252 is what I get with current alpha 12.0 nightly from 20 Mar 2023.

orbitalquark commented 1 year ago

Thanks for following up. Textadept replies on Lua for its file I/O, and Lua (through Windows' C runtime) cannot operate on arbitrary UTF-8 filenames. It can only use filenames that contain characters in the user's current filesystem encoding (CP1252 in your case). If you changed your computer to a Greek locale, then you may be able to start using files with Greek characters in filenames. However, this is probably not reasonable.

Many other applications use Windows' UTF-16 API for operating on files, and thus do not exhibit this behavior. Unfortunately, I don't see an easy fix. I will need to update the manual and API documentation to reflect this limitation. Sorry about that :(

For more information, see http://lua-users.org/lists/lua-l/2019-01/msg00076.html

ousia commented 1 year ago

Many thanks for your reply, @orbitalquark.

Windows has still a long road to go before reaching Unicode.

Many thanks for your help.