Problem with lexical-binding and (invalid-read-syntax "#")

skeeto commented 11 years ago

The following code works as expected with the default lexical-binding value of nil.

(mapcar #'async-get
        (cl-loop repeat 10 collect
                 (async-start (lambda () t))))

But when lexical-binding is set to t the subprocess fails with a reader error.

Debugger entered--Lisp error: (invalid-read-syntax "#")
  signal(invalid-read-syntax ("#"))
  (if (and (listp result) (eq (quote async-signal) (nth 0 result))) (signal (car (nth 1 result)) (cdr (nth 1 result))) (funcall func result))
  (unwind-protect (if (and (listp result) (eq (quote async-signal) (nth 0 result))) (signal (car (nth 1 result)) (cdr (nth 1 result))) (funcall func result)) (if async-debug nil (kill-buffer buf)))
  (if (null func) (progn (set (make-local-variable (quote async-callback-value)) result) (set (make-local-variable (quote async-callback-value-set)) t)) (unwind-protect (if (and (listp result) (eq (quote async-signal) (nth 0 result))) (signal (car (nth 1 result)) (cdr (nth 1 result))) (funcall func result)) (if async-debug nil (kill-buffer buf))))
  async-handle-result(identity (async-signal (invalid-read-syntax "#")) #<buffer *emacs*<159>>)
  (save-current-buffer (set-buffer (process-buffer future)) (async-handle-result (function identity) async-callback-value (current-buffer)))
  async-get(#<process emacs<1>>)
  mapcar(async-get (#<process emacs> #<process emacs<1>> #<process emacs<2>> #<process emacs<3>> #<process emacs<4>> #<process emacs<5>> #<process emacs<6>> #<process emacs<7>> #<process emacs<8>> #<process emacs<9>>))
  (progn (mapcar (function async-get) (cl-loop repeat 10 collect (async-start (lambda nil t)))))
  eval((progn (mapcar (function async-get) (cl-loop repeat 10 collect (async-start (lambda nil t))))) t)
  eval-last-sexp-1(nil)
  eval-last-sexp(nil)
  ad-Orig-call-interactively(eval-last-sexp nil nil)
  call-interactively(eval-last-sexp nil nil)

DarwinAwardWinner commented 11 years ago

Try setting async-debug to t, and you'll see messages like the following:

Transmitting sexp {{{'(closure
  ((--cl-var-- #<process emacs<8>> #<process emacs<7>> #<process emacs<6>> #<process emacs<5>> #<process emacs<4>> #<process emacs<3>> #<process emacs<2>> #<process emacs<1>> #<process emacs>)
   (--cl-var-- . 0)
   t)
  nil t)
}}}

The syntax used, (closure LIST nil t) is the lexical equivalent of (lambda nil t), which is the function that you are passing to async-start. Any lambda expression evaluated while lexical-binding is true is converted to such a closure. the LIST part describes the lexical environment over which the closure has closed. In this case since cl-macs, the file that defines cl-loop, also uses lexical-binding, the cl-loop macro expands to a form that let-binds a few lexical variables:

;; Input:
(macroexpand '(cl-loop repeat 10 collect
                       (async-start (lambda () t))))
;; Result:
(cl--block-wrapper
 (catch (quote --cl-block-nil--) 
   (let* ((--cl-var-- 10) (--cl-var-- nil)) 
     (while (>= (setq --cl-var-- (1- --cl-var--)) 0)
       (push (async-start (lambda nil t)) --cl-var--))
     (nreverse --cl-var--))))

The second --cl-var-- is the variable used to accumulate the collected results. Unfortunately, async-start returns a process object, which has a printed representation that cannot be read back in by read. Since this variable and its unreadable value are included in the resulting closure, when async-start attempts to pass this closure to the subprocess, it fails because the call to read in the subprocess cannot read it.

The workaround in this case is to quote the lambda so that it is not expanded into a lexical closure but rather passed literally to the subprocess:

(mapcar #'async-get
        (cl-loop repeat 10 collect
                 (async-start '(lambda () t))))

We can confirm the fix by noting that with async-debug set, we see the following messages:

Transmitting sexp {{{'(lambda nil t)
}}}

indicating that our lambda is being passed as a lambda, not a closure.

In terms of fixing this in async.el, I think it would be reasonable to force the START-FUNC argument to async-start to always be evaluated with lexical-binding set to nil, since the lexical environment of the parent process is unlikely to be of use in the child process. Or else first evaluate START-FUNC using eval, which, in a lexical context, evaluates an expression in a null lexical environment (source), thus guaranteeing that no problematic variables will be closed over. Doing either of these would, of course, require making async-start a macro. If you're willing to accept a patch, I'd be happy to write it.

DarwinAwardWinner commented 11 years ago

Oops, minor correction; async-start is already a macro.

skeeto commented 11 years ago

@DarwinAwardWinner I guess the real problem is Emacs closures capturing the entire lexical environment regardless of whether or not all parts of the environment are actually being used. One feature that makes Emacs closure more powerful than other language's closures is that they're printable -- except when the environment holds an unprintable value -- which I think is really cool. It would be a shame to throw that away.

Fortunately closures created by compiled code minimize their environment capture and avoid this problem. If I wrap my broken code above in a function and compile it, it works fine.

(defun foo ()
  (mapcar #'async-get
          (cl-loop repeat 10 collect
                   (async-start (lambda () t)))))

(byte-compile 'foo)
(foo)
;; => (t t t t t t t t t t)

What do you think about allowing compiled closures through instead of simply forcing dynamic binding?

DarwinAwardWinner commented 11 years ago

One hackish solution would be to filter the variable list of a closure to remove any unprintable values. Or filter the variable list to remove anything not referred to in the body. Does either of those solutions sound good?

DarwinAwardWinner commented 11 years ago

A related problem is that I don't think the auto-conversion of lambdas to closures is well documented, so I don't actually know when the conversion happens. This is why my "fix" is to force that conversion to happen at a known time under known conditions for all lambdas, rather than to try and keep the form as a lambda.

I tried running all lambdas and closures through byte-compile, but that made some tests fail.

jwiegley / emacs-async

Problem with lexical-binding and (invalid-read-syntax "#") #17