Open Mi605 opened 1 year ago
I'm not able to reproduce this, but that doesn't mean there isn't a bug --
Here's the test I ran:
# Run a terminal with a specific title for testing this
% xterm -title "Apresentação de diapositivos - /media/sda4/images/foto.png" sh
# Search for it by name and print the window class:
% xdotool search --name "Apresentação" getwindowclassname
XTerm
% xdotool search --name "Apresentação de diapositivos - /media/sda4/images/foto.png" getwindowclassname
XTerm
My locale is en_US.UTF-8
if that helps. Maybe it's different for yours? I wonder if there's something else going on.
One small note: xdotool search uses regular expressions for matching (does not do exact text match), and so special regular expression characters like []
and .
and others have special meanings (for example, .
(period) means "match any single character"). I don't think this is the cause of the problem at this time, but I wanted to point it out just in case it makes things harder.
Can you describe more about your environment?
ooooh I can reproduce this using feh
if I name the file "Apresentação de diapositivos.jpg"
% feh Apresentação\ de\ diapositivos.jpg
...
% xdotool search --name 'Apresenta'
8388609
% xdotool search --name 'Apresentação'
<no output>
Comparing my xterm example vs the feh example, and using LANG=C
to help me see the byte sequences involved:
# feh
% LANG=C xprop WM_NAME
WM_NAME(STRING) = "feh [1 of 1] - Apresenta\303\247\303\243o de diapositivos.jpg"
# xterm -title example
% LANG=C xprop WM_NAME
WM_NAME(STRING) = "Apresenta\347\343o de diapositivos - /media/sda4/images/foto.png"
I can see two different byte sequences used to present the same information.
I don't have any hypothesis about why this fails in xdotool, but I'm guessing it's that xdotool is given (or is using) a different byte sequence to represent ç
or similar. This seems like a weird bug!
Many thanks for looking into this that fast. And also many thanks for making me aware of this important fact:
xdotool search uses regular expressions for matching … wanted to point it out just in case it makes things harder.
Actually this makes things harder, since when searching automated for a window title within a script you rarely can control whether the search string contains some relevant regex markers. To avoid difficult automated masking it would be great to have something like the -F switch in grep for this --name search in xdotool also. But as you already said, I also don't reckon this is the reason here for the weird behaviour, since it can get narrowed down when shortening the window title avoiding the square brackets in search string.
Since this is a regex search, why does the following search string doesn't find a window its title containing the string "Apresentação de diapositivos.jpg" you also used?
$ xdotool search --name 'Apresenta..o'
(no result)
$ xdotool search --name 'Apresenta'
52428801
Actually the two dots should match the two characters in a regex, even when these are foreign or special characters. And for other characters it does work fine:
$ xdotool search --name 'Apre..nta'
52428801
For now I have written a dirty workaround:
$ xdotool windowclose $(($(wmctrl -l | grep -F "$windowname" | cut -d' ' -f1)))
This works fine with any type of foreign characters or special characters and might help other people until this issue is fixed in xdotool. But it doesn't help for cases I need to search for something like
$ xdotool search --onlyvisible --name "$windowname" windowminimize
I know, it is possible to script a workaround for this also, but it'll take some additional lines.
P.S.:
Can you describe more about your environment?
Since you finally have reproduced the issue, do you still need more details about the environment I'm running? If not, I'll save the time collecting the detailed infos. Some basics: This is an antiX 23 (based on debian bookworm) on kernel 6.1.10, my locale is de_DE.UTF-8. I have counterchecked on an antiX 22 (based on debian bullseye) running kernel 5.10.142 with same results. xdotool is same version on both installations (while the package manager shows 1:3.20160805.1-4 on the bullseye system instead of the -5 from above from bookworm). Please let me know when you need further infos.
xdotool search --name 'Apresenta..o'
Actually the two dots should match the two characters in a regex
Indeed! I believe the problem is that xdotool and the regex library it uses assumes "one byte is one character" where, in fact, under UTF-8, one character can be a variety of one or more byte sequences.
So in this case, I would expect 'Apresenta....o' to match it because of 'Apresentação' the 'çã' part is two characters, but represented by four bytes total. ç is two bytes in sequence0xC3 0xA7
in UTF-8.
Fixing this regex issue to match what a human would describe as 'characters' (including multibyte utf-8 sequences) may require changing to a different regex library. It is my understanding that libc's regexes/regcomp functions only understand single-byte characters and have no awareness of UTF-8 nor any kind of LANG or locale settings.
Separately, I'm open to adding a 'exact match' or similar kind of matching feature to xdotool search so you could match an exact string (or a partial substring) rather than using regex. Thoughts?
do you still need more details about the environment I'm running
I believe we have enough information, so I don't need any extra work from you to gather details needed :)
So in this case, I would expect 'Apresenta....o' to match it because of 'Apresentação' the 'çã' part is two characters, but represented by four bytes total. ç is two bytes in sequence
0xC3 0xA7
in UTF-8
Yes, this was what I had also expected. But it didn't match. I had tried this. Then I had checked the regex with three dots, just in case one of the characters was seen as single byte for some strange reason, without result. Right now I had the idea to add additional dots, one by one, until I get a match, and in the end it turned out you actually need 8 dots to catch the pattern:
$ xdotool search --name 'Apresenta........o de diapositivos'
32549734
matches for „Apresentação de diapositivos.png” present in a window title. Maybe this helps fixing it.
Fixing this regex issue to match what a human would describe as 'characters' (including multibyte utf-8 sequences) may require changing to a different regex library. It is my understanding that libc's regexes/regcomp functions only understand single-byte characters and have no awareness of UTF-8 nor any kind of LANG or locale settings.
Unfortunately I don't understand much about libraries, since I'm not a programmer. But I guess it could be somehow similar or connected to this issue concerning leafpad text editor, which was fixed by the devs already last december. Maybe knowing about this can help you fixing. It was also about not matching properly foreign and special characters in a search function. And also there the count of these characters was calculated wrongly. leafpad, issue 11
Separately, I'm open to adding a 'exact match' or similar kind of matching feature to xdotool search so you could match an exact string (or a partial substring) rather than using regex. Thoughts?
Yes, that would be really great. That's exactly what is missing when you need a precise match among arbitrary window titles which may contain regex markers in arbitrary positions of the string, so it's difficult to mask these markers or build a regex pattern to catch the precise name (or precise parts of it) when you only have a regex search available. So yes, this would be a highly appreciated feature.
I believe we have enough information, so I don't need any extra work from you to gather details needed :)
No problem at all. I like to contribute and help what I can supporting free software (not being a programmer). And let me thank you and the xdotool team very much for all the time effort you put into this great tool.
it's difficult to mask these markers or build a regex pattern to catch the precise name
Agreed. No one deserves such a punishment 😂
As time permits, I'll work on an exact string match feature and will update this issue with any news.
I wondered why some scripts making use of xdotool for window management (resize, move, mimimize, restore etc.) arbitrarily fail on some windows, causing unforseeable results. After some hours of testing I found out there is a bug in the search --name method provided by xdotool. It works properly on plain English systems only, when opening windows with plain English named media, while failing on many foreign languages as well as in case the title bar contains special characters (e.g. set by an application in the window title), which happens regularly, since user has full control of file names on his system, or when playing media in a window the media provider controls the window title in many cases, so it is quite common to see special characters there.
Examples:
or:
and:
while as long no foreign or special character is present in window title it is found properly:
This renders all scripts relying on looking up a specific window for further handling using xdotool search unreliable.
Sometimes the expected action takes place and sometimes not, depending on whether arbitrarily the window title contains a special character or not.
Please, urgently fix this.