Open andry81 opened 2 years ago
Hi, thanks for interesting issue.
LinkInfo
structure contains only one field with path.
Looks like it can be encoded as utf-8
.
Can you check diacritic_characters branch with possible fix?
Can you check diacritic_characters branch with possible fix?
c:\Work\OpenSource\pylnk\diacritic_characters>c:\python\x64\310\python
Python 3.10.1 (heads/3.10.1-win7:830a41fd9d, Dec 12 2021, 11:29:02) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pylnk3
>>> lnk = pylnk3.Lnk('d:\\1.txt.lnk')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\Work\OpenSource\pylnk\diacritic_characters\pylnk3.py", line 1504, in __init__
self._parse_lnk_file(f)
File "c:\Work\OpenSource\pylnk\diacritic_characters\pylnk3.py", line 1555, in _parse_lnk_file
self._link_info = LinkInfo(lnk, unicode=self.link_flags.IsUnicode)
File "c:\Work\OpenSource\pylnk\diacritic_characters\pylnk3.py", line 994, in __init__
self._parse_path_elements(lnk)
File "c:\Work\OpenSource\pylnk\diacritic_characters\pylnk3.py", line 1026, in _parse_path_elements
self.local_base_path = read_cstring(lnk, encoding=self.encoding)
File "c:\Work\OpenSource\pylnk\diacritic_characters\pylnk3.py", line 186, in read_cstring
return s.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 3: invalid start byte
I've just created d:\ööö\1.txt
directory and file. Then just have used ctrl-c
and Windows Explorer
context menu to paste as shortcut into d:\1.txt.lnk
.
Can you share broken lnk file?
Can't reproduce myself.
Win10 En writes utf-8 path with diacritics into LinkInfo.
Win7 Ru writes acsii path (converts diacritic symbols to closest latin ones) into LinkInfo.
Both reads without errors with new branch.
Can you share broken lnk file?
I suspect there is some other format than utf-8.
By the way the 0xf6
is code of the ö
character.
it's cp1252
, but i does not know how choice correct encoding
it's
cp1252
How did you find that? There is at least 3 code pages which has no difference: 1250, 1257, 1258.
, but i does not know how choice correct encoding
You can create the --chcp <str>
parameter or something for that. And add --ignore-decode-errors
to call decode(..., errors='ignore')
instead.
How did you find that? There is at least 3 code pages which has no difference: 1250, 1257, 1258.
Just guesses. It's default for english Windows. And it's decodes path correctly.
You can try master branch with changed DEFAULT_CHARSET
to cp1252
.
Does there exist instructions how to build executable in the Scripts?
Can you add the same fix as in another link parser repository?
Another solution here is that. If try to use --json
print:
{
"relative_path": ".\\\u00f6\u00f6\u00f6\\1.txt",
"work_dir": "D:\\\u00f6\u00f6\u00f6",
"link_info": {
"local_base_path": "D:\\\u0446\u0446\u0446\\1.txt"
},
}
It does print correct characters in case of relative_path
property. May be add an option to decode the TargetPath property as composition of work_dir
+ relative_path
as an alternative?
As noted here: https://stackoverflow.com/questions/39365489/how-do-you-keep-diacritics-in-shortcut-paths
The
WScript.Shell
implementation does not support diacritic characters in an Unicode string in case ofTargetPath
shortcut property. But this module has the same issue:c:\1.txt.lnk
->c:\ööö\1.txt
But
WorkingDirectory
property is not affected:I've compared with https://github.com/Matmaus/LnkParse3 implementation and it returns more reliable results:
When
pylnk3
is not:I've tried to change the code:
But it still returns a truncated variant. Seems the app does read only one property field (Ansi) instead of 2 (Ansi+Unicode) as
LnkParse3
does.