jordansissel / xdotool

fake keyboard/mouse input, window management, and more
Other
3.17k stars 316 forks source link

search --name fails on window titles containing special characters #420

Open Mi605 opened 1 year ago

Mi605 commented 1 year ago

I wondered why some scripts making use of xdotool for window management (resize, move, mimimize, restore etc.) arbitrarily fail on some windows, causing unforseeable results. After some hours of testing I found out there is a bug in the search --name method provided by xdotool. It works properly on plain English systems only, when opening windows with plain English named media, while failing on many foreign languages as well as in case the title bar contains special characters (e.g. set by an application in the window title), which happens regularly, since user has full control of file names on his system, or when playing media in a window the media provider controls the window title in many cases, so it is quite common to see special characters there.

Examples:

$ xdotool getwindowname 37748737
feh [1 of 1] - /media/sda4/Imagens próprias/Foto de férias, São Paulo, Brasil 16. Março 2022 12:15:40.png

$ xdotool search --name 'feh [1 of 1] - /media/sda4/Imagens próprias/Foto de férias, São Paulo, Brasil 16. Março 2022 12:15:40.png'
(no result)

or:

$ xdotool getwindowname 52428801
Image Viewer — /media/sda4/images/foto.png

$ xdotool search --name 'Image Viewer — /media/sda4/images/foto.png'
(no result)

and:

$ xdotool getwindowname 44040195
Apresentação de diapositivos - /media/sda4/images/foto.png

$ xdotool search --name 'Apresentação de diapositivos - /media/sda4/images/foto.png'
(no result)

while as long no foreign or special character is present in window title it is found properly:

$ xdotool getwindowname 56623105
feh [1 of 1] - /media/sda4/Images/Foto 01.png

$ xdotool search --name 'feh [1 of 1] - /media/sda4/Images/Foto 01.png'
56623105

This renders all scripts relying on looking up a specific window for further handling using xdotool search unreliable.

xdotool search --name "$wintitle" windowsize $value1 $value2 windowmove $value3 $value4

Sometimes the expected action takes place and sometimes not, depending on whether arbitrarily the window title contains a special character or not.

Please, urgently fix this.

$ xdotool --version
xdotool version 3.20160805.1

$ LANG=C apt-cache policy xdotool
xdotool:
  Installed: 1:3.20160805.1-5
  Candidate: 1:3.20160805.1-5
  Version table:
 *** 1:3.20160805.1-5 500
        500 http://ftp.de.debian.org/debian bookworm/main amd64 Packages
        100 /var/lib/dpkg/status
jordansissel commented 1 year ago

I'm not able to reproduce this, but that doesn't mean there isn't a bug --

Here's the test I ran:

# Run a terminal with a specific title for testing this
% xterm -title "Apresentação de diapositivos - /media/sda4/images/foto.png" sh

# Search for it by name and print the window class:
% xdotool search --name "Apresentação"  getwindowclassname
XTerm

% xdotool search --name "Apresentação de diapositivos - /media/sda4/images/foto.png" getwindowclassname
XTerm

My locale is en_US.UTF-8 if that helps. Maybe it's different for yours? I wonder if there's something else going on.

One small note: xdotool search uses regular expressions for matching (does not do exact text match), and so special regular expression characters like [] and . and others have special meanings (for example, . (period) means "match any single character"). I don't think this is the cause of the problem at this time, but I wanted to point it out just in case it makes things harder.

Can you describe more about your environment?

jordansissel commented 1 year ago

ooooh I can reproduce this using feh if I name the file "Apresentação de diapositivos.jpg"

% feh Apresentação\ de\ diapositivos.jpg 
...

% xdotool search --name 'Apresenta'   
8388609

% xdotool search --name 'Apresentação'                                    
<no output>
jordansissel commented 1 year ago

Comparing my xterm example vs the feh example, and using LANG=C to help me see the byte sequences involved:

# feh
% LANG=C xprop WM_NAME
WM_NAME(STRING) = "feh [1 of 1] - Apresenta\303\247\303\243o de diapositivos.jpg"

# xterm -title example
% LANG=C xprop WM_NAME
WM_NAME(STRING) = "Apresenta\347\343o de diapositivos - /media/sda4/images/foto.png"

I can see two different byte sequences used to present the same information.

I don't have any hypothesis about why this fails in xdotool, but I'm guessing it's that xdotool is given (or is using) a different byte sequence to represent ç or similar. This seems like a weird bug!

Mi605 commented 1 year ago

Many thanks for looking into this that fast. And also many thanks for making me aware of this important fact:

xdotool search uses regular expressions for matching … wanted to point it out just in case it makes things harder.

Actually this makes things harder, since when searching automated for a window title within a script you rarely can control whether the search string contains some relevant regex markers. To avoid difficult automated masking it would be great to have something like the -F switch in grep for this --name search in xdotool also. But as you already said, I also don't reckon this is the reason here for the weird behaviour, since it can get narrowed down when shortening the window title avoiding the square brackets in search string.

Since this is a regex search, why does the following search string doesn't find a window its title containing the string "Apresentação de diapositivos.jpg" you also used?

$ xdotool search --name 'Apresenta..o'
(no result)

$ xdotool search --name 'Apresenta'
52428801

Actually the two dots should match the two characters in a regex, even when these are foreign or special characters. And for other characters it does work fine:

$ xdotool search --name 'Apre..nta'
52428801

For now I have written a dirty workaround:

$ xdotool windowclose $(($(wmctrl -l | grep -F "$windowname" | cut -d' ' -f1)))

This works fine with any type of foreign characters or special characters and might help other people until this issue is fixed in xdotool. But it doesn't help for cases I need to search for something like

$ xdotool search --onlyvisible --name "$windowname" windowminimize

I know, it is possible to script a workaround for this also, but it'll take some additional lines.

P.S.:

Can you describe more about your environment?

Since you finally have reproduced the issue, do you still need more details about the environment I'm running? If not, I'll save the time collecting the detailed infos. Some basics: This is an antiX 23 (based on debian bookworm) on kernel 6.1.10, my locale is de_DE.UTF-8. I have counterchecked on an antiX 22 (based on debian bullseye) running kernel 5.10.142 with same results. xdotool is same version on both installations (while the package manager shows 1:3.20160805.1-4 on the bullseye system instead of the -5 from above from bookworm). Please let me know when you need further infos.

jordansissel commented 1 year ago

xdotool search --name 'Apresenta..o'

Actually the two dots should match the two characters in a regex

Indeed! I believe the problem is that xdotool and the regex library it uses assumes "one byte is one character" where, in fact, under UTF-8, one character can be a variety of one or more byte sequences.

So in this case, I would expect 'Apresenta....o' to match it because of 'Apresentação' the 'çã' part is two characters, but represented by four bytes total. ç is two bytes in sequence0xC3 0xA7 in UTF-8.

Fixing this regex issue to match what a human would describe as 'characters' (including multibyte utf-8 sequences) may require changing to a different regex library. It is my understanding that libc's regexes/regcomp functions only understand single-byte characters and have no awareness of UTF-8 nor any kind of LANG or locale settings.

Separately, I'm open to adding a 'exact match' or similar kind of matching feature to xdotool search so you could match an exact string (or a partial substring) rather than using regex. Thoughts?

do you still need more details about the environment I'm running

I believe we have enough information, so I don't need any extra work from you to gather details needed :)

Mi605 commented 1 year ago

So in this case, I would expect 'Apresenta....o' to match it because of 'Apresentação' the 'çã' part is two characters, but represented by four bytes total. ç is two bytes in sequence 0xC3 0xA7 in UTF-8

Yes, this was what I had also expected. But it didn't match. I had tried this. Then I had checked the regex with three dots, just in case one of the characters was seen as single byte for some strange reason, without result. Right now I had the idea to add additional dots, one by one, until I get a match, and in the end it turned out you actually need 8 dots to catch the pattern:

$ xdotool search --name 'Apresenta........o de diapositivos'
32549734

matches for „Apresentação de diapositivos.png” present in a window title. Maybe this helps fixing it.

Fixing this regex issue to match what a human would describe as 'characters' (including multibyte utf-8 sequences) may require changing to a different regex library. It is my understanding that libc's regexes/regcomp functions only understand single-byte characters and have no awareness of UTF-8 nor any kind of LANG or locale settings.

Unfortunately I don't understand much about libraries, since I'm not a programmer. But I guess it could be somehow similar or connected to this issue concerning leafpad text editor, which was fixed by the devs already last december. Maybe knowing about this can help you fixing. It was also about not matching properly foreign and special characters in a search function. And also there the count of these characters was calculated wrongly. leafpad, issue 11

Separately, I'm open to adding a 'exact match' or similar kind of matching feature to xdotool search so you could match an exact string (or a partial substring) rather than using regex. Thoughts?

Yes, that would be really great. That's exactly what is missing when you need a precise match among arbitrary window titles which may contain regex markers in arbitrary positions of the string, so it's difficult to mask these markers or build a regex pattern to catch the precise name (or precise parts of it) when you only have a regex search available. So yes, this would be a highly appreciated feature.

I believe we have enough information, so I don't need any extra work from you to gather details needed :)

No problem at all. I like to contribute and help what I can supporting free software (not being a programmer). And let me thank you and the xdotool team very much for all the time effort you put into this great tool.

jordansissel commented 1 year ago

it's difficult to mask these markers or build a regex pattern to catch the precise name

Agreed. No one deserves such a punishment 😂

As time permits, I'll work on an exact string match feature and will update this issue with any news.