sharplispers / clx

a fork of crhodes' fork of danb's fork of the CLX library, an X11 client for Common Lisp
Other
114 stars 46 forks source link

CLX on CMUCL can't connect to X server via an inet socket #187

Closed kreuter closed 3 years ago

kreuter commented 3 years ago

Describe the bug

(This is a copy CMUCL issue #112, using its issue reporting template. I've edited this slightly for specificity to this project. CMUCL includes a copy of sharplispers' CLX that's a couple years old, and I'm able to reproduce this defect against the source in today's master.)

On CMUCL, CLX is unable to connect to an X server over inet sockets.

This prevents running CLX programs over networks, including through ssh's X11 forwarding.

The root cause is that XLIB::HOST-ADDRESS is unable to resolve hostnames, but always errors.

To Reproduce

There are multiple ways to reproduce. (These reproduction steps use CMUCL's distribution of sharplisper's CLX, for simplicity of demonstration. The same error occurs when compiling/loading CLX from sharplispers upstream.)

Here's a transcript of the failure using the ssh X11 forwarding approach. This example starts on a Darwin host running Xquartz (that's host-A for the above), and "debian10" is a host running Debian 10 (that's host-B).

$ ssh -X debian10
Linux debian 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Aug 14 09:53:43 2021 from 10.0.2.2
$ echo $DISPLAY
localhost:10.0
$ xdpyinfo | sed -n '/^$/q; p' # we don't care about screen geometries
name of display:    localhost:10.0
version number:    11.0
vendor string:    The X.Org Foundation
vendor release number:    11907000
X.Org version: 1.19.7
maximum request size:  16777212 bytes
motion buffer size:  256
bitmap unit, bit order, padding:    32, LSBFirst, 32
image byte order:    LSBFirst
number of supported pixmap formats:    7
supported pixmap formats:
    depth 1, bits_per_pixel 1, scanline_pad 32
    depth 4, bits_per_pixel 8, scanline_pad 32
    depth 8, bits_per_pixel 8, scanline_pad 32
    depth 15, bits_per_pixel 16, scanline_pad 32
    depth 16, bits_per_pixel 16, scanline_pad 32
    depth 24, bits_per_pixel 32, scanline_pad 32
    depth 32, bits_per_pixel 32, scanline_pad 32
keycode range:    minimum 8, maximum 255
focus:  window 0xe00024, revert to None
number of extensions:    23
    Apple-DRI
    Apple-WM
    BIG-REQUESTS
    DAMAGE
    DOUBLE-BUFFER
    GLX
    Generic Event Extension
    MIT-SCREEN-SAVER
    MIT-SHM
    Present
    RANDR
    RENDER
    SECURITY
    SGI-GLX
    SHAPE
    SYNC
    X-Resource
    XC-MISC
    XFIXES
    XINERAMA
    XInputExtension
    XKEYBOARD
    XVideo
default screen number:    0
number of screens:    1
$ cd ~/cmucl-snapshots/cmucl-2018-02
$ ./bin/lisp -nositeinit -noinit
CMU Common Lisp snapshot-2018-02 (21C Unicode), running on debian
With core: /net/_gateway/Users/kreuter/cmucl-snapshots/cmucl-2018-02/lib/cmucl/lib/lisp-sse2.core
Dumped on: Sat, 2018-02-10 18:54:52-05:00 on lorien3
See <http://www.cmucl.org/> for support information.
Loaded subsystems:
    Unicode 1.29 with Unicode version 6.2.0
    Python 1.1, target Intel x86/sse2
    CLOS based on Gerd's PCL 2010/03/19 15:19:03
* (require :clx)

; Loading #P"/net/_gateway/Users/kreuter/cmucl-snapshots/cmucl-2018-02/lib/cmucl/lib/subsystems/clx-library.sse2f".

; [GC threshold exceeded with 12,004,720 bytes in use.  Commencing GC.]
; [GC completed with 3,404,048 bytes retained and 8,600,672 bytes freed.]
; [GC will next occur when at least 15,404,048 bytes are in use.]
("CLX")
* (xlib:open-display "localhost" :display 10 :protocol :internet)

Error in function XLIB::HOST-ADDRESS:  Unknown host "localhost"
   [Condition of type SIMPLE-ERROR]

Restarts:
  0: [ABORT] Return to Top-Level.

Debug  (type H for help)

(XLIB::HOST-ADDRESS "localhost" #<unused-arg>)
Source: Error finding source: 
Error in function DEBUG::GET-FILE-TOP-LEVEL-FORM:  Source file no longer exists:
  target:clx/dependent.lisp.
0] :back

0: (XLIB::HOST-ADDRESS "localhost" #<unused-arg>)
1: (XLIB::GET-BEST-AUTHORIZATION "localhost" 10 :INTERNET)
2: (XLIB:OPEN-DISPLAY "localhost" :DISPLAY 10 :PROTOCOL ...)
3: (INTERACTIVE-EVAL (XLIB:OPEN-DISPLAY "localhost" :DISPLAY 10 :PROTOCOL ...))
4: (LISP::%TOP-LEVEL)
5: ((LABELS LISP::RESTART-LISP SAVE-LISP))

Expected behavior

Here's a transcript of the behavior under the 2018-01 snapshot, which predated the incorporation of sharplispers' CLX.

$ ssh -X debian10
Linux debian 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Aug 14 09:53:58 2021 from 10.0.2.2
$ cd ~/cmucl-snapshots/cmucl-2018-01
$ ./bin/lisp -nositeinit -noinit
CMU Common Lisp snapshot-2018-01 (21C Unicode), running on debian
With core: /net/_gateway/Users/kreuter/cmucl-snapshots/cmucl-2018-01/lib/cmucl/lib/lisp-sse2.core
Dumped on: Sun, 2018-01-14 22:06:41-05:00 on lorien3
See <http://www.cmucl.org/> for support information.
Loaded subsystems:
    Unicode 1.29 with Unicode version 6.2.0
    Python 1.1, target Intel x86/sse2
    CLOS based on Gerd's PCL 2010/03/19 15:19:03
* (require :clx)

; Loading #P"/net/_gateway/Users/kreuter/cmucl-snapshots/cmucl-2018-01/lib/cmucl/lib/subsystems/clx-library.sse2f".
; [GC threshold exceeded with 12,009,288 bytes in use.  Commencing GC.]
; [GC completed with 3,458,904 bytes retained and 8,550,384 bytes freed.]
; [GC will next occur when at least 15,458,904 bytes are in use.]
("CLX")
* (xlib:open-display "localhost" :display 10 :protocol :internet)

#<XLIB:DISPLAY localhost:10 (The X.Org Foundation R11907000)>

Fix for this defect

The following definition of XLIB::HOST-ADDRESS fixes this problem. (This is the definition that was present in CMUCL's distribution of CLX between Aug 2007 and Jan 2018; more on the history below.) Note that when evaluating this DEFUN interactively, it's necessary also to re-evaluate (DEFUN GET-BEST-AUTHORIZATION ...) in display.lisp, as CMUCL's compiler will have previously compiled GET-BEST-AUTHORIZATION with knowledge that the faulty HOST-ADDRESS definition can't return anything.

#+CMU
(defun host-address (host &optional (family :internet))
  ;; Return a list whose car is the family keyword (:internet :DECnet :Chaos)
  ;; and cdr is a list of network address bytes.
  (declare (type stringable host)
       (type (or null (member :internet :decnet :chaos) card8) family))
  (declare (clx-values list))
  (labels ((no-host-error ()
         (error "Unknown host ~S" host))
       (no-address-error ()
         (error "Host ~S has no ~S address" host family)))
    (let ((hostent (ext:lookup-host-entry (string host))))
      (when (not hostent)
    (no-host-error))
      (ecase family
    ((:internet nil 0)
     (unless (= (ext::host-entry-addr-type hostent) 2)
       (no-address-error))
     (append (list :internet)
         (let ((addr (first (ext::host-entry-addr-list hostent))))
            (list (ldb (byte 8 24) addr)
                  (ldb (byte 8 16) addr)
                  (ldb (byte 8  8) addr)
                  (ldb (byte 8  0) addr)))))))))

Desktop (please complete the following information):

Additional context

XLIB:OPEN-DISPLAY searches an Xauthority file for a key that matches the desired host and display number before opening any socket. Hosts are matched by network address, not by name. When OPEN-DISPLAY's PROTOCOL argument is :LOCAL, the network address of "127.0.0.1" is used unconditionally, so no host name resolution is involved. But when the protocol is :INTERNET, a host name argument to OPEN-DISPLAY must be resolved to a network address.

XLIB::HOST-ADDRESS is CLX's internal interface for hostname resolution, so it needs to work correctly in order for OPEN-DISPLAY to work with PROTOCOL :INTERNET.

The bug in HOST-ADDRESS is easy to identify: the telent-clx (and now sharplispers' clx) version of this function for CMUCL includes the names for 3 different interfaces to resolve a hostname, but hides each interface behind a different feature check (and doesn't include any read-time conditional for the case that none of the 3 features is present). I don't see anyplace where CLX or CMUCL might arrange for any of the 3 features when building CLX, and so HOST-ADDRESS always gets compiled to signal an error.

And now some archaeology:

telent-clx started about 20 years ago, and says it was derived from CMUCL's CLX. Here's what looks to be the latest version of CMUCL's XLIB::HOST-ADDRESS before telent-clx's first commit:

https://gitlab.common-lisp.net/cmucl/cmucl/-/blob/c026fc30bd5719cc7e6ba3d80d8c7470bb99760f/clx/dependent.lisp#L2477

As you can see, there's no read-time conditionalization inside the DEFUN.

So it looks like telent-clx introduced those feature checks; see line 1470 and following here:

https://github.com/sharplispers/clx/blame/master/dependent.lisp

It seems possible that an unmodified copy of telent-clx and its descendants may never have been able to connect to X servers over inet sockets on CMUCL.

Note that although CMUCL did import telent-clx about 13 years ago, it did not include telent-clx's feature checks inside HOST-ADDRESS:

https://gitlab.common-lisp.net/cmucl/cmucl/-/blame/631990102d817923c0fadcdad841695570d6e1cb/src/clx/dependent.lisp#L2741

So CMUCL's modified telent-clx did not contain this bug.

However, upon incorporating sharplispers' clx in late Jan 2018, CMUCL got those telent-clx feature checks, and so acquired this bug.

https://gitlab.common-lisp.net/cmucl/cmucl/-/blame/f5c564ec702c869cebccf4de91e339e3e2fb2f02/clx/dependent.lisp#L1034