Three suggestions from performance profiling

Hi and thank you for this svelte library!

I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :)

First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below. (EDIT: See a working version at https://github.com/jwiegley/emacs-async/issues/193#issuecomment-2444430620)

(let ((done-ctr 0)
      (max 20)) ;; Pretend 20 cores -> 20 processes
  (profiler-start 'cpu+mem)
  (dotimes (_ max)
    (async-start (lambda ()
                   (sleep-for 0.5)
                   ;; Simulate a real-world "hairy" dataset
                   (make-list 30 (cl-loop
                                  repeat (random 5) collect
                                  (make-list (random 5)
                                             (number-to-string (random))))))
                 (lambda (result)
                   (ignore result)
                   (when (= (cl-incf done-ctr) max)
                     (profiler-stop)
                     (profiler-report))))))

Backtrace:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  async-handle-result((closure ((max . 20) (done-ctr . 0)) (result) (ignore result) (if (= (setq done-ctr (1+ done-ctr)) max) (progn (profiler-stop) (profiler-report)))) (async-signal (wrong-type-argument number-or-marker-p nil)) #<buffer *emacs*<18>>)
  async-when-done(#<process emacs> "finished\n")

Fortunately, I was able to get results anyway, because when it hits the error, the profiler has not been stopped. So I can manually stop and produce a report, and see results from the processes that did not hit the error.

Findings follow.

1. `backward-sexp`

In the case of a large amount of data, async-when-done spends half its CPU just to call backward-sexp once. The rest is spent on the FINISH-FUNC, so it's pretty sleek aside from this one call.

Suggestion: Run something other than backward-sexp This substitute works in my application (org-node, which I'm refactoring to depend on async.el):

;; We can see that replacing the `backward-sexp' call
;; completely eliminates that perf bottleneck
(advice-add 'backward-sexp :override
             (byte-compile (defun async--goto-start-of-value ()
                             (goto-char 1)
                             (search-forward "Lisp expression: \n"))))
;; Clean up after testing
(advice-remove 'backward-sexp #'async--goto-start-of-value)

2. `locate-library`

Having solved the case of a large amount of data, over 60% of CPU time is spent on locate-library, which is repeated for every subprocess spawned.

Suggestion: Memoize the result of locate-library.

To expire this memoization, I see two options:

a timer of 10 seconds or so
make a convenience wrapper for spawning many async processes at once, in which this value and potentially other values could be memoized. I would also be willing to develop such a wrapper, if the PR would be accepted.

Bonus: I happen to use in production something that's faster than locate-library, and ensures to use the .eln if available. Don't have FSF assignment yet, but if you want this verbatim, I'll get off my butt and submit the paperwork.

In any case, using .eln somehow would promise some all-around perf boosts.

(defun ensure-compiled-lib (feature)
  "Look for .eln, .elc or .el file corresponding to FEATURE.
FEATURE is a symbol as it shows up in `features'.

Guess which one was in fact loaded by the current Emacs,
and return it if it is .elc or .eln.

If it is .el, then opportunistically compile it and return the newly
compiled file instead.  This returns an .elc on the first call, then an
.eln on future calls.

Note: if you are currently editing the source code for FEATURE, use
`eval-buffer' and save to ensure this finds the correct file."
  (let* ((hit (cl-loop
               for (file . elems) in load-history
               when (eq feature (cdr (assq 'provide elems)))
               return
               ;; Want two pieces of info: the file path according to
               ;; `load-history', and some function supposedly defined
               ;; there.  The function is a better source of info, for
               ;; discovering an .eln.
               (cons file (cl-loop
                           for elem in elems
                           when (and (consp elem)
                                     (eq 'defun (car elem))
                                     (not (consp (symbol-function (cdr elem))))
                                     (not (function-alias-p (cdr elem))))
                           return (cdr elem)))))
         (file-name-handler-alist '(("\\.gz\\'" . jka-compr-handler))) ;; perf
         (loaded (or (and (ignore-errors
                            ;; REVIEW: `symbol-file' uses expand-file-name,
                            ;;         but I'm not convinced it is needed
                            (expand-file-name
                             (native-comp-unit-file
                              (subr-native-comp-unit
                               (symbol-function (cdr hit)))))))
                     (car hit))))
    (unless loaded
      (error "Current Lisp definitions need to come from a file %S[.el/.elc/.eln]"
             feature))
    ;; HACK: Sometimes comp.el makes freefn- temp files; pretend we found .el.
    ;;       Not a good hack, because load-path is NOT as trustworthy as
    ;;       load-history.
    (when (string-search "freefn-" loaded)
      (setq loaded
            (locate-file (symbol-name feature) load-path '(".el" ".el.gz"))))
    (if (or (string-suffix-p ".el" loaded)
            (string-suffix-p ".el.gz" loaded))
        (or (when (native-comp-available-p)
              ;; If we built an .eln last time, use it now even though
              ;; the current Emacs process is still running interpreted .el.
              (comp-lookup-eln loaded))
            (let* ((elc (file-name-concat temporary-file-directory
                                          (concat (symbol-name feature)
                                                  ".elc")))
                   (byte-compile-dest-file-function
                    `(lambda (&rest _) ,elc)))
              (when (native-comp-available-p)
                (native-compile-async (list loaded)))
              ;; Native comp may take a while, so return elc this time.
              ;; We should not pick an .elc from load path if Emacs is now
              ;; running interpreted code, since the code's likely newer.
              (if (or (file-newer-than-file-p elc loaded)
                      (byte-compile-file loaded))
                  ;; NOTE: On Guix we should never end up here, but if we
                  ;;       did, that'd be a problem as Guix will probably
                  ;;       reuse the first .elc we ever made forever, even
                  ;;       after upgrades to .el, due to 1970 timestamps.
                  elc
                loaded)))
      ;; Either .eln or .elc was loaded, so use the same for the
      ;; subprocesses.  We should not opportunistically build an .eln if
      ;; Emacs had loaded an .elc for the current process, because we
      ;; cannot assume the source .el is equivalent code.
      ;; The .el could be in-development, newer than .elc, so subprocesses
      ;; should use the same .elc for compatibility right up until the point
      ;; the developer actually evals the .el buffer.
      loaded)))

(EDIT Oct 29: Fixed some issues in that code snippet)

3. `file-truename`

While locate-library stood for 60%+, file-truename stood for about 8%.

Suggestions:

memoize this value, or
don't run file-truename if not needed.
- Don't know what issue it tried to solve, but perhaps file-chase-links could suffice?

Martin Edström @.***> writes:

Hi and thank you for this svelte library!

I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :)

First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below.

(let ((done-ctr 0) (max 20)) ;; Pretend 20 cores -> 20 processes (profiler-start 'cpu+mem) (dotimes (_ max) (async-start (lambda () (sleep-for 0.5) ;; Simulate a real-world "hairy" dataset (make-list 30 (cl-loop repeat (random 5) collect (make-list (random 5) (number-to-string (random)))))) (lambda (result) (ignore result) (when (= (cl-incf done-ctr) max) (profiler-stop) (profiler-report))))))

Thanks to look into this, but I wonder why you run

(dotime (_ 20) (async-start (lambda () (do-something))) ...)

instead of

(async-start (lambda () (dotimes (_ 20) (do-something))) ...)

In the first place your application is not really async because you are back to the emacs parent after each iteration, while in the second sexp you are completely async, running only one emacs child.

-- Thierry

It is still async, just with 20 children instead of one working in parallel (if your computer has 20 cores). Nice for performance-intensive applications :)

But yea, then you have 20 sentinels waiting to run, instead of 1. So it's good to ensure that they are optimized, also to eliminate overhead in things like async-when-done.

Martin Edström @.***> writes:

( ) text/plain (*) text/html

It is still async, just with 20 children instead of one working in parallel (if your computer has 20 cores).

I hardly see how it could be async as the loop is turning on the parent side and blocking Emacs even if at each iteration an async process is running, did I miss something?

-- Thierry

OK, first, I fixed the test-snippet. For some reason the error wrong-type argument number-or-marker-p nil did not appear once I removed cl-loop from the START-FUNC.

(defvar done-ctr 0)
(let ((max 20))
  (setq done-ctr 0)
  (profiler-start 'cpu+mem)
  ;; Supposing device has 20 cores, launch 20 processes
  (dotimes (_ max)
    (async-start (lambda ()
                   (sleep-for 0.5)
                   ;; Simulate a real-world "hairy" dataset
                   (thread-last (make-list 50 nil)
                                (mapcar #'random)
                                (mapcar #'number-to-string)
                                (make-list (random 15))
                                (make-list (random 15))))
                 `(lambda (result)
                    (ignore result)
                    (cl-incf done-ctr)
                    (message "Receiving results from... process %d" done-ctr)
                    (when (= done-ctr ,max)
                      (profiler-stop)
                      (profiler-report))))))

Second, you'll see if you eval that, that Emacs stays responsive until the results start coming in from the 20 different subprocesses.

In my mental model, async-start is just make-process, and FINISH-FUNC is the process sentinel. Right? The sentinels are not called during the dotimes loop, but at an undefined point in the future, after the processes have finished.

That's how this is async. The dotimes loop just launches 20 system processes, so the loop itself finishes in milliseconds, long before any one of the processes have finished.

EDIT: Interestingly, I'm getting different profiler results. Now it is string-match standing for 65% of CPU time. May be affected by the exact shape of the simulated "hairy" dataset.

        1325  67% - #<byte-code-function D58>
        1325  67%  - async-read-from-client
        1284  65%     string-match
          24   1%     read
          12   0%     search-forward
         384  19% - command-execute
         384  19%  - funcall-interactively
         384  19%   - eval-last-sexp
         384  19%    - #<byte-code-function D3C>
         384  19%     - elisp--eval-last-sexp
         384  19%      - eval
         384  19%       - progn
         384  19%        - let
         384  19%         - let
         384  19%          - while
         384  19%           - let
         384  19%            - async-start
         354  18%             - async--emacs-program-args
         354  18%              + locate-library
          19   0%             + file-truename
          10   0%             + async-start-process
           1   0%             + async--transmit-sexp

Speaking of string-match, I have not understood all the reasons that went into async.el design, like why it uses a custom process filter.

If it is in fact possible to refactor... the Emacs 30 NEWS file makes an argument for using the built-in process filter:

---
** The default process filter was rewritten in native code.
The round-trip through the Lisp function
'internal-default-process-filter' is skipped when the process filter is
the default one.  It is reimplemented in native code, reducing GC churn.
To undo this change, set 'fast-read-process-output' to nil.

+++

Saw the issues on here about problems decoding hash (#) characters, like #145, but are they still current? Perhaps only caused by using a custom process filter?

I've tested with vanilla make-process and the default process filter. Made the subprocesses call prin1 to return literal records looking like #s(data data data), and there was no issue calling read in the sentinel. That's what org-node does now (at these lines https://github.com/meedstrom/org-node/blob/3241743c4b3d0c69968b301e62cb0602932297da/org-node.el#L1215-L1255).

Martin Edström @.***> writes:

The dotimes loop just launches 20 system processes, so the loop itself finishes in milliseconds, long before any one of the processes have finished.

Ah yes of course, so I will look into this as soon as I am back at home in november.

Thanks.

-- Thierry

jwiegley / emacs-async