Open meedstrom opened 2 days ago
Martin Edström @.***> writes:
Hi and thank you for this svelte library!
I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :)
First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below.
(let ((done-ctr 0) (max 20)) ;; Pretend 20 cores -> 20 processes (profiler-start 'cpu+mem) (dotimes (_ max) (async-start (lambda () (sleep-for 0.5) ;; Simulate a real-world "hairy" dataset (make-list 30 (cl-loop repeat (random 5) collect (make-list (random 5) (number-to-string (random)))))) (lambda (result) (ignore result) (when (= (cl-incf done-ctr) max) (profiler-stop) (profiler-report))))))
Thanks to look into this, but I wonder why you run
(dotime (_ 20) (async-start (lambda () (do-something))) ...)
instead of
(async-start (lambda () (dotimes (_ 20) (do-something))) ...)
?
In the first place your application is not really async because you are back to the emacs parent after each iteration, while in the second sexp you are completely async, running only one emacs child.
-- Thierry
It is still async, just with 20 children instead of one working in parallel (if your computer has 20 cores). Nice for performance-intensive applications :)
But yea, then you have 20 sentinels waiting to run, instead of 1. So it's good to ensure that they are optimized, also to eliminate overhead in things like async-when-done
.
Martin Edström @.***> writes:
- ( ) text/plain (*) text/html
It is still async, just with 20 children instead of one working in parallel (if your computer has 20 cores).
I hardly see how it could be async as the loop is turning on the parent side and blocking Emacs even if at each iteration an async process is running, did I miss something?
-- Thierry
OK, first, I fixed the test-snippet. For some reason the error wrong-type argument number-or-marker-p nil
did not appear once I removed cl-loop
from the START-FUNC.
(defvar done-ctr 0)
(let ((max 20))
(setq done-ctr 0)
(profiler-start 'cpu+mem)
;; Supposing device has 20 cores, launch 20 processes
(dotimes (_ max)
(async-start (lambda ()
(sleep-for 0.5)
;; Simulate a real-world "hairy" dataset
(thread-last (make-list 50 nil)
(mapcar #'random)
(mapcar #'number-to-string)
(make-list (random 15))
(make-list (random 15))))
`(lambda (result)
(ignore result)
(cl-incf done-ctr)
(message "Receiving results from... process %d" done-ctr)
(when (= done-ctr ,max)
(profiler-stop)
(profiler-report))))))
Second, you'll see if you eval that, that Emacs stays responsive until the results start coming in from the 20 different subprocesses.
In my mental model, async-start is just make-process
, and FINISH-FUNC is the process sentinel. Right? The sentinels are not called during the dotimes
loop, but at an undefined point in the future, after the processes have finished.
That's how this is async. The dotimes
loop just launches 20 system processes, so the loop itself finishes in milliseconds, long before any one of the processes have finished.
EDIT: Interestingly, I'm getting different profiler results. Now it is string-match
standing for 65% of CPU time. May be affected by the exact shape of the simulated "hairy" dataset.
1325 67% - #<byte-code-function D58>
1325 67% - async-read-from-client
1284 65% string-match
24 1% read
12 0% search-forward
384 19% - command-execute
384 19% - funcall-interactively
384 19% - eval-last-sexp
384 19% - #<byte-code-function D3C>
384 19% - elisp--eval-last-sexp
384 19% - eval
384 19% - progn
384 19% - let
384 19% - let
384 19% - while
384 19% - let
384 19% - async-start
354 18% - async--emacs-program-args
354 18% + locate-library
19 0% + file-truename
10 0% + async-start-process
1 0% + async--transmit-sexp
Speaking of string-match
, I have not understood all the reasons that went into async.el design, like why it uses a custom process filter.
If it is in fact possible to refactor... the Emacs 30 NEWS file makes an argument for using the built-in process filter:
---
** The default process filter was rewritten in native code.
The round-trip through the Lisp function
'internal-default-process-filter' is skipped when the process filter is
the default one. It is reimplemented in native code, reducing GC churn.
To undo this change, set 'fast-read-process-output' to nil.
+++
Saw the issues on here about problems decoding hash (#) characters, like #145, but are they still current? Perhaps only caused by using a custom process filter?
I've tested with vanilla make-process and the default process filter. Made the subprocesses call prin1
to return literal records looking like #s(data data data)
, and there was no issue calling read
in the sentinel. That's what org-node does now (at these lines https://github.com/meedstrom/org-node/blob/3241743c4b3d0c69968b301e62cb0602932297da/org-node.el#L1215-L1255).
Martin Edström @.***> writes:
The dotimes loop just launches 20 system processes, so the loop itself finishes in milliseconds, long before any one of the processes have finished.
Ah yes of course, so I will look into this as soon as I am back at home in november.
Thanks.
-- Thierry
Take your time :)
Hi and thank you for this svelte library!
I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :)
First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below. (EDIT: See a working version at https://github.com/jwiegley/emacs-async/issues/193#issuecomment-2444430620)
Backtrace:
Fortunately, I was able to get results anyway, because when it hits the error, the profiler has not been stopped. So I can manually stop and produce a report, and see results from the processes that did not hit the error.
Findings follow.
1.
backward-sexp
In the case of a large amount of data,
async-when-done
spends half its CPU just to callbackward-sexp
once. The rest is spent on the FINISH-FUNC, so it's pretty sleek aside from this one call.Suggestion: Run something other than
backward-sexp
This substitute works in my application (org-node, which I'm refactoring to depend on async.el):2.
locate-library
Having solved the case of a large amount of data, over 60% of CPU time is spent on
locate-library
, which is repeated for every subprocess spawned.Suggestion: Memoize the result of
locate-library
.To expire this memoization, I see two options:
Bonus: I happen to use in production something that's faster than
locate-library
, and ensures to use the .eln if available. Don't have FSF assignment yet, but if you want this verbatim, I'll get off my butt and submit the paperwork.In any case, using .eln somehow would promise some all-around perf boosts.
(EDIT Oct 29: Fixed some issues in that code snippet)
3.
file-truename
While
locate-library
stood for 60%+,file-truename
stood for about 8%.Suggestions:
file-truename
if not needed.file-chase-links
could suffice?