(This is a copy CMUCL issue #112, using its issue reporting template. I've edited this slightly for specificity to this project. CMUCL includes a copy of sharplispers' CLX that's a couple years old, and I'm able to reproduce this defect against the source in today's master.)
On CMUCL, CLX is unable to connect to an X server over inet sockets.
This prevents running CLX programs over networks, including through ssh's X11 forwarding.
The root cause is that XLIB::HOST-ADDRESS is unable to resolve hostnames, but always errors.
To Reproduce
There are multiple ways to reproduce. (These reproduction steps use CMUCL's distribution of sharplisper's CLX, for simplicity of demonstration. The same error occurs when compiling/loading CLX from sharplispers upstream.)
[SSH X11 forwarding]
Assume all of
an X server running on host-A,
CMUCL with CLX installed on host-B,
sshd running on host-B, configured to enable X11 forwarding,
[Optional] some non-CLX X clients installed on host-B, for verifying that SSH's X11 forwarding works.
[Note that host-B might be the same host as host-A.]
On host-A, ssh to host-B, e.g., with ssh -X name-for-host-B
In the ssh session on host-B, inspect the value of DISPLAY. It ought to be something like localhost:NN.0, where NN is some decimal number. Take note of NN.
[Optional] Verify that SSH's X11 forwarding works for some non-CLX X client. Anything will do; we'll use xdpyinfo below.
Start CMUCL, e.g., ./bin/lisp -nositeinit -noinit in some directory containing a snapshot from 2018-02 or later.
At the REPL, load CLX and try to connect to the X server specified by the DISPLAY environment variable, using NN discovered above: (require :clx) (xlib:open-display "localhost" :display NN :protocol :internet)
(Note that when DISPLAY=localhost:NN.0, XLIB:OPEN-DEFAULT-DISPLAY calls XLIB:OPEN-DISPLAY with such arguments. This reproduction calls OPEN-DISPLAY explicitly for clarity.)
This will produce an error. Note that the erroring function is XLIB::HOST-ADDRESS.
[Connecting to an X server without SSH's X11 forwarding]
(Note that this is an insecure way for an X client to communicate with an X server in general. I describe it only because this reproduction doesn't involve configuring ssh, so might be easier to test.)
Assume all of
a reasonably complete X11 installation on some host
CMUCL with CLX installed that same host.
Identify a display number that's available for use on the host. Let's say that 5 is available.
Create a fresh Xauthority file to use solely for this testing touch ~/Xauthority.cmucl-clx && mcookie | sed -e 's/^/add :5 . /'|xauth -q -f ~/Xauthority.cmucl-clx
Start a fresh X server using the selected display number, the Xauthority file just created, and ensuring that the X server is listening for inet connections.
If you're already in an X session of some kind, start a nested X server such as Xephyr Xephyr :5 -listen inet -auth ~/Xauthority.cmucl-clx & or Xnest Xnest :5 -listen inet -auth ~/Xauthority.cmucl-clx &
[Untested] If you're at the OS console (i.e., not inside an X session), something like this might do xinit -- :5 -listen inet -auth ~/Xauthority.cmucl-clx
From some OS prompt (it doesn't need to be a descendant of xinit), start CMUCL, ensuring that the XAUTHORITY environment variable specifies the Xauthority file created for testing, e.g., XAUTHORITY=~/Xauthority.cmucl-clx ./bin/lisp -nositeinit -noinit in some directory containing a snapshot from 2018-02 or later.
At the REPL, load CLX and try to connect to the X server you've just started (require :clx) (xlib:open-display "localhost" :display 5 :protocol :internet)
This will produce an error. Note that the erroring function is XLIB::HOST-ADDRESS.
Here's a transcript of the failure using the ssh X11 forwarding approach. This example starts on a Darwin host running Xquartz (that's host-A for the above), and "debian10" is a host running Debian 10 (that's host-B).
$ ssh -X debian10
Linux debian 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Aug 14 09:53:43 2021 from 10.0.2.2
$ echo $DISPLAY
localhost:10.0
$ xdpyinfo | sed -n '/^$/q; p' # we don't care about screen geometries
name of display: localhost:10.0
version number: 11.0
vendor string: The X.Org Foundation
vendor release number: 11907000
X.Org version: 1.19.7
maximum request size: 16777212 bytes
motion buffer size: 256
bitmap unit, bit order, padding: 32, LSBFirst, 32
image byte order: LSBFirst
number of supported pixmap formats: 7
supported pixmap formats:
depth 1, bits_per_pixel 1, scanline_pad 32
depth 4, bits_per_pixel 8, scanline_pad 32
depth 8, bits_per_pixel 8, scanline_pad 32
depth 15, bits_per_pixel 16, scanline_pad 32
depth 16, bits_per_pixel 16, scanline_pad 32
depth 24, bits_per_pixel 32, scanline_pad 32
depth 32, bits_per_pixel 32, scanline_pad 32
keycode range: minimum 8, maximum 255
focus: window 0xe00024, revert to None
number of extensions: 23
Apple-DRI
Apple-WM
BIG-REQUESTS
DAMAGE
DOUBLE-BUFFER
GLX
Generic Event Extension
MIT-SCREEN-SAVER
MIT-SHM
Present
RANDR
RENDER
SECURITY
SGI-GLX
SHAPE
SYNC
X-Resource
XC-MISC
XFIXES
XINERAMA
XInputExtension
XKEYBOARD
XVideo
default screen number: 0
number of screens: 1
$ cd ~/cmucl-snapshots/cmucl-2018-02
$ ./bin/lisp -nositeinit -noinit
CMU Common Lisp snapshot-2018-02 (21C Unicode), running on debian
With core: /net/_gateway/Users/kreuter/cmucl-snapshots/cmucl-2018-02/lib/cmucl/lib/lisp-sse2.core
Dumped on: Sat, 2018-02-10 18:54:52-05:00 on lorien3
See <http://www.cmucl.org/> for support information.
Loaded subsystems:
Unicode 1.29 with Unicode version 6.2.0
Python 1.1, target Intel x86/sse2
CLOS based on Gerd's PCL 2010/03/19 15:19:03
* (require :clx)
; Loading #P"/net/_gateway/Users/kreuter/cmucl-snapshots/cmucl-2018-02/lib/cmucl/lib/subsystems/clx-library.sse2f".
; [GC threshold exceeded with 12,004,720 bytes in use. Commencing GC.]
; [GC completed with 3,404,048 bytes retained and 8,600,672 bytes freed.]
; [GC will next occur when at least 15,404,048 bytes are in use.]
("CLX")
* (xlib:open-display "localhost" :display 10 :protocol :internet)
Error in function XLIB::HOST-ADDRESS: Unknown host "localhost"
[Condition of type SIMPLE-ERROR]
Restarts:
0: [ABORT] Return to Top-Level.
Debug (type H for help)
(XLIB::HOST-ADDRESS "localhost" #<unused-arg>)
Source: Error finding source:
Error in function DEBUG::GET-FILE-TOP-LEVEL-FORM: Source file no longer exists:
target:clx/dependent.lisp.
0] :back
0: (XLIB::HOST-ADDRESS "localhost" #<unused-arg>)
1: (XLIB::GET-BEST-AUTHORIZATION "localhost" 10 :INTERNET)
2: (XLIB:OPEN-DISPLAY "localhost" :DISPLAY 10 :PROTOCOL ...)
3: (INTERACTIVE-EVAL (XLIB:OPEN-DISPLAY "localhost" :DISPLAY 10 :PROTOCOL ...))
4: (LISP::%TOP-LEVEL)
5: ((LABELS LISP::RESTART-LISP SAVE-LISP))
Expected behavior
Here's a transcript of the behavior under the 2018-01 snapshot, which predated the incorporation of sharplispers' CLX.
$ ssh -X debian10
Linux debian 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Aug 14 09:53:58 2021 from 10.0.2.2
$ cd ~/cmucl-snapshots/cmucl-2018-01
$ ./bin/lisp -nositeinit -noinit
CMU Common Lisp snapshot-2018-01 (21C Unicode), running on debian
With core: /net/_gateway/Users/kreuter/cmucl-snapshots/cmucl-2018-01/lib/cmucl/lib/lisp-sse2.core
Dumped on: Sun, 2018-01-14 22:06:41-05:00 on lorien3
See <http://www.cmucl.org/> for support information.
Loaded subsystems:
Unicode 1.29 with Unicode version 6.2.0
Python 1.1, target Intel x86/sse2
CLOS based on Gerd's PCL 2010/03/19 15:19:03
* (require :clx)
; Loading #P"/net/_gateway/Users/kreuter/cmucl-snapshots/cmucl-2018-01/lib/cmucl/lib/subsystems/clx-library.sse2f".
; [GC threshold exceeded with 12,009,288 bytes in use. Commencing GC.]
; [GC completed with 3,458,904 bytes retained and 8,550,384 bytes freed.]
; [GC will next occur when at least 15,458,904 bytes are in use.]
("CLX")
* (xlib:open-display "localhost" :display 10 :protocol :internet)
#<XLIB:DISPLAY localhost:10 (The X.Org Foundation R11907000)>
Fix for this defect
The following definition of XLIB::HOST-ADDRESS fixes this problem. (This is the definition that was present in CMUCL's distribution of CLX between Aug 2007 and Jan 2018; more on the history below.) Note that when evaluating this DEFUN interactively, it's necessary also to re-evaluate (DEFUN GET-BEST-AUTHORIZATION ...) in display.lisp, as CMUCL's compiler will have previously compiled GET-BEST-AUTHORIZATION with knowledge that the faulty HOST-ADDRESS definition can't return anything.
#+CMU
(defun host-address (host &optional (family :internet))
;; Return a list whose car is the family keyword (:internet :DECnet :Chaos)
;; and cdr is a list of network address bytes.
(declare (type stringable host)
(type (or null (member :internet :decnet :chaos) card8) family))
(declare (clx-values list))
(labels ((no-host-error ()
(error "Unknown host ~S" host))
(no-address-error ()
(error "Host ~S has no ~S address" host family)))
(let ((hostent (ext:lookup-host-entry (string host))))
(when (not hostent)
(no-host-error))
(ecase family
((:internet nil 0)
(unless (= (ext::host-entry-addr-type hostent) 2)
(no-address-error))
(append (list :internet)
(let ((addr (first (ext::host-entry-addr-list hostent))))
(list (ldb (byte 8 24) addr)
(ldb (byte 8 16) addr)
(ldb (byte 8 8) addr)
(ldb (byte 8 0) addr)))))))))
Desktop (please complete the following information):
OS: All of the above is captured on Debian 10 running as a qemu guest of a Darwin qemu host. The X server I'm connecting to is Xquartz running on the Darwin qemu host.
Additional context
XLIB:OPEN-DISPLAY searches an Xauthority file for a key that matches the desired host and display number before opening any socket. Hosts are matched by network address, not by name. When OPEN-DISPLAY's PROTOCOL argument is :LOCAL, the network address of "127.0.0.1" is used unconditionally, so no host name resolution is involved. But when the protocol is :INTERNET, a host name argument to OPEN-DISPLAY must be resolved to a network address.
XLIB::HOST-ADDRESS is CLX's internal interface for hostname resolution, so it needs to work correctly in order for OPEN-DISPLAY to work with PROTOCOL :INTERNET.
The bug in HOST-ADDRESS is easy to identify: the telent-clx (and now sharplispers' clx) version of this function for CMUCL includes the names for 3 different interfaces to resolve a hostname, but hides each interface behind a different feature check (and doesn't include any read-time conditional for the case that none of the 3 features is present). I don't see anyplace where CLX or CMUCL might arrange for any of the 3 features when building CLX, and so HOST-ADDRESS always gets compiled to signal an error.
And now some archaeology:
telent-clx started about 20 years ago, and says it was derived from CMUCL's CLX. Here's what looks to be the latest version of CMUCL's XLIB::HOST-ADDRESS before telent-clx's first commit:
It seems possible that an unmodified copy of telent-clx and its descendants may never have been able to connect to X servers over inet sockets on CMUCL.
Note that although CMUCL did import telent-clx about 13 years ago, it did not include telent-clx's feature checks inside HOST-ADDRESS:
Describe the bug
(This is a copy CMUCL issue #112, using its issue reporting template. I've edited this slightly for specificity to this project. CMUCL includes a copy of sharplispers' CLX that's a couple years old, and I'm able to reproduce this defect against the source in today's master.)
On CMUCL, CLX is unable to connect to an X server over inet sockets.
This prevents running CLX programs over networks, including through ssh's X11 forwarding.
The root cause is that XLIB::HOST-ADDRESS is unable to resolve hostnames, but always errors.
To Reproduce
There are multiple ways to reproduce. (These reproduction steps use CMUCL's distribution of sharplisper's CLX, for simplicity of demonstration. The same error occurs when compiling/loading CLX from sharplispers upstream.)
ssh -X name-for-host-B
localhost:NN.0
, whereNN
is some decimal number. Take note ofNN
.xdpyinfo
below../bin/lisp -nositeinit -noinit
in some directory containing a snapshot from 2018-02 or later.NN
discovered above:(require :clx) (xlib:open-display "localhost" :display NN :protocol :internet)
DISPLAY=localhost:NN.0
, XLIB:OPEN-DEFAULT-DISPLAY calls XLIB:OPEN-DISPLAY with such arguments. This reproduction calls OPEN-DISPLAY explicitly for clarity.)5
is available.touch ~/Xauthority.cmucl-clx && mcookie | sed -e 's/^/add :5 . /'|xauth -q -f ~/Xauthority.cmucl-clx
Xephyr :5 -listen inet -auth ~/Xauthority.cmucl-clx &
or XnestXnest :5 -listen inet -auth ~/Xauthority.cmucl-clx &
xinit -- :5 -listen inet -auth ~/Xauthority.cmucl-clx
XAUTHORITY=~/Xauthority.cmucl-clx ./bin/lisp -nositeinit -noinit
in some directory containing a snapshot from 2018-02 or later.(require :clx) (xlib:open-display "localhost" :display 5 :protocol :internet)
Here's a transcript of the failure using the ssh X11 forwarding approach. This example starts on a Darwin host running Xquartz (that's host-A for the above), and "debian10" is a host running Debian 10 (that's host-B).
Expected behavior
Here's a transcript of the behavior under the 2018-01 snapshot, which predated the incorporation of sharplispers' CLX.
Fix for this defect
The following definition of XLIB::HOST-ADDRESS fixes this problem. (This is the definition that was present in CMUCL's distribution of CLX between Aug 2007 and Jan 2018; more on the history below.) Note that when evaluating this DEFUN interactively, it's necessary also to re-evaluate (DEFUN GET-BEST-AUTHORIZATION ...) in display.lisp, as CMUCL's compiler will have previously compiled GET-BEST-AUTHORIZATION with knowledge that the faulty HOST-ADDRESS definition can't return anything.
Desktop (please complete the following information):
Additional context
XLIB:OPEN-DISPLAY searches an Xauthority file for a key that matches the desired host and display number before opening any socket. Hosts are matched by network address, not by name. When OPEN-DISPLAY's PROTOCOL argument is :LOCAL, the network address of "127.0.0.1" is used unconditionally, so no host name resolution is involved. But when the protocol is :INTERNET, a host name argument to OPEN-DISPLAY must be resolved to a network address.
XLIB::HOST-ADDRESS is CLX's internal interface for hostname resolution, so it needs to work correctly in order for OPEN-DISPLAY to work with PROTOCOL :INTERNET.
The bug in HOST-ADDRESS is easy to identify: the telent-clx (and now sharplispers' clx) version of this function for CMUCL includes the names for 3 different interfaces to resolve a hostname, but hides each interface behind a different feature check (and doesn't include any read-time conditional for the case that none of the 3 features is present). I don't see anyplace where CLX or CMUCL might arrange for any of the 3 features when building CLX, and so HOST-ADDRESS always gets compiled to signal an error.
And now some archaeology:
telent-clx started about 20 years ago, and says it was derived from CMUCL's CLX. Here's what looks to be the latest version of CMUCL's XLIB::HOST-ADDRESS before telent-clx's first commit:
https://gitlab.common-lisp.net/cmucl/cmucl/-/blob/c026fc30bd5719cc7e6ba3d80d8c7470bb99760f/clx/dependent.lisp#L2477
As you can see, there's no read-time conditionalization inside the DEFUN.
So it looks like telent-clx introduced those feature checks; see line 1470 and following here:
https://github.com/sharplispers/clx/blame/master/dependent.lisp
It seems possible that an unmodified copy of telent-clx and its descendants may never have been able to connect to X servers over inet sockets on CMUCL.
Note that although CMUCL did import telent-clx about 13 years ago, it did not include telent-clx's feature checks inside HOST-ADDRESS:
https://gitlab.common-lisp.net/cmucl/cmucl/-/blame/631990102d817923c0fadcdad841695570d6e1cb/src/clx/dependent.lisp#L2741
So CMUCL's modified telent-clx did not contain this bug.
However, upon incorporating sharplispers' clx in late Jan 2018, CMUCL got those telent-clx feature checks, and so acquired this bug.
https://gitlab.common-lisp.net/cmucl/cmucl/-/blame/f5c564ec702c869cebccf4de91e339e3e2fb2f02/clx/dependent.lisp#L1034