ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.62k stars 401 forks source link

Performance: Trace file shows `Source tree scan` stage takes 38+ seconds #10880

Closed jchavarri closed 1 month ago

jchavarri commented 1 month ago

We continue trying to figure out in which ways we could optimize our Dune build setup at Ahrefs.

One of the things that quickly stands out is that when enabling --trace-file, we can see that the "Source tree scan" step of folder . takes more than 38s.

One of the instances of the Source tree scan events shows the folder backend/api/src, which takes ~37s. But after that event, there's only very short lived events which don't add much information.

Is there any way in which we could modify Dune to get more granular information about where the time is being spent? For example, if would be interesting to know if this time is spent parsing rules (of which dune files), executing commands that generate sexp files, etc. If those things is something that can happen during the Source tree scan stage, of course.

Please find below the trace file (obtained with dune build @fe.all @aa --trace-file fe.all-aa.txt) and a screenshot loading the trace in Brave (Chrome-like browser):

dune-build-fe.all-aa-trace.txt

image

Specifications

ejgallego commented 1 month ago

@jchavarri , are your sources in this case stored on some kind of networked file system?

jchavarri commented 1 month ago

No, they are local. The only non-conventional thing of this setup is that the build is running in a remote machine that I access through ssh.

ejgallego commented 1 month ago

I see, thanks. stracing dune could also provide some interesting information on where the time is being spent.

jchavarri commented 1 month ago

I can share strace report, but looking at how time is being spent:

$ time dune build @fe.all @aa

real    0m40.724s
user    0m35.698s
sys     0m4.171s

I doubt it will give us a lot of insights. It seems time is mostly spent on user CPU- or memory-bound tasks.

nojb commented 1 month ago

From memory I remember that the "Source tree scan" event actually englobes the whole build; ie the actual build commands are sub-events of "Source tree scan", so the 38+ s that you are observing may just be the build doing its thing.

Does it make sense?

jchavarri commented 1 month ago

@ejgallego Here's the output of strace, shows very small times as expected from previous time command:

$ strace -c dune build @fe.all @aa
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 27.59    0.239930           1    224380       411 newfstatat
 16.07    0.139802         198       705           write
 13.46    0.117093           0    131842    131841 mkdir
 13.17    0.114519           1     94422     40419 openat
  9.05    0.078707           0    135781         4 lseek
  8.64    0.075177           0     84601           read
  5.81    0.050530           3     16614           getdents64
  5.31    0.046152           0     54011           close
  0.48    0.004218           1      3188       284 futex
  0.21    0.001869         233         8           munmap
  0.16    0.001396           0      2864           rt_sigprocmask
  0.02    0.000152           2        57           mmap
  0.01    0.000110           1        77           brk
  0.00    0.000029          29         1           readlink
  0.00    0.000019           1        12           mprotect
  0.00    0.000014           3         4           vfork
  0.00    0.000011           1         6           unlink
  0.00    0.000002           0         5           rt_sigaction
  0.00    0.000002           0         3           ioctl
  0.00    0.000002           1         2           sigaltstack
  0.00    0.000002           2         1           prlimit64
  0.00    0.000002           2         1           getrandom
  0.00    0.000001           0         6           pipe2
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         2           getpid
  0.00    0.000000           0         2           socket
  0.00    0.000000           0         2         2 connect
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           wait4
  0.00    0.000000           0         2           fcntl
  0.00    0.000000           0         1           flock
  0.00    0.000000           0         5           getcwd
  0.00    0.000000           0         1           chdir
  0.00    0.000000           0         2         1 rmdir
  0.00    0.000000           0         1           chmod
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           rseq
  0.00    0.000000           0         8           clone3
------ ----------- ----------- --------- --------- ----------------
100.00    0.869739           1    748626    172963 total

ie the actual build commands are sub-events of "Source tree scan"

@nojb This is a "noop" build (all build commands already happened in previous runs of dune build, and Dune doesn't have any rules that need executing). I suspect that Source tree scan includes both the actual source tree scan and the rules generation phase. Is that what you mean?

(thanks both for your input!)

nojb commented 1 month ago

I suspect that Source tree scan includes both the actual source tree scan and the rules generation phase. Is that what you mean?

Yes, in fact Source tree scan includes pretty much everything under it (if memory serves).

rgrinberg commented 1 month ago

It does not generate the rules, but it does initialize all the projects, constructs the scopes, and parses every dune file.

rgrinberg commented 1 month ago

Some perf output would be useful here

jchavarri commented 1 month ago

Here it is:

dune_build_feall_aa.perf.zip

jchavarri commented 1 month ago

it does initialize all the projects, constructs the scopes, and parses every dune file

Maybe the result of this issue could be to rename this stage to "Project initialization, scope construction, dune parsing" instead of "Source tree scan". Does that make sense?

rgrinberg commented 1 month ago

It would be more accurate. I can't read your perf file unfortunately

jchavarri commented 1 month ago

Try loading it in https://profiler.firefox.com/

rgrinberg commented 1 month ago

Thanks, that works. I do see a lot of rule generation in that trace. Are you using dynamic includes perhaps?

jchavarri commented 1 month ago

We use atd a lot, so we have a bunch of .sexp files with Dune rules, I can see a few of them with 900+ rules, a couple of them with 1100 rules. In total, there are almost 5000 rules in these files:

$ find backend -type f -name "*.sexp" -exec grep -o "(rule" {} + | wc -l
4887

Do you think this could be causing trouble? I can look into optimizing these to try to reduce the number of rules.

jchavarri commented 1 month ago

I do see a lot of rule generation in that trace.

Side note: curious what kind of things you look at on the Firefox profiler to reach this kind of conclusion. What kind of views do you use? And what led you realize rule generation could be the issue? Thanks!

jchavarri commented 1 month ago

Edit: ignore this comment, I was measuring wrongly 😕

~I created some synthetic dune files to compare times between different configs at scale.~

~One interesting realization is that using write-file is about 2x slower than using with-stdout-to + echo. In my case, a zero-ops builds using a dune file with 50k rules of this kind takes 10s for the former approach vs 5s with the latter.~

~Here are python scripts to generate these files:~

with open("dune", "w") as dune_file:
    for i in range(1, 50001):
        dune_file.write("""
(rule
 (targets {0}.txt)
 (action (write-file {0}.txt "This is file {0}")))
""".format(i))

print("dune file generated with 50000 rules.")
with open("dune", "w") as dune_file:
    for i in range(1, 50001):
        dune_file.write("""
(rule
 (targets {0}.txt)
 (action (with-stdout-to {0}.txt (echo "This is file {0}"))))
""".format(i))

print("dune file generated with 50000 rules.")
rgrinberg commented 1 month ago

I saw the load_rules function in the trace which invokes the various rule generation callbacks.

Even though your measurement is incorrect, we have in fact sped up reading and writing of files and that let to a noticeable speed up in dune internally. Don't get your hopes too high though, I'd imagine this will shave off a few seconds - not much more.

jchavarri commented 1 month ago

In case it helps, I managed to get a profile including src lines for the function calls to the actual Dune code. For backtracking these, the commit used is a054882.

Unfortunately Firefox Profiler doesn't support srclines, it has to be opened with https://www.speedscope.app/. Make sure to choose Left Heavy view and thread id (tid) 245288.

dune_build_feall_aa_srclines.perf.zip

image
rgrinberg commented 1 month ago

Thanks for the updated trace. It mostly points to things that we've already known to be problematic. We've fixed them in our fork, but the truth is that that they might make your build at most 2x faster. So in total, we're looking at 19 seconds instead of 38. I doubt that will make you happy. Of course more optimizations are possible, and you're welcome to try to implement them. There's not much low hanging fruit though, so it will take real engineering work to shave off more seconds beyond that.

Feel free to open tickets with traces for more concrete bottlenecks that you've detected. This issue is too general to be actionable. Source scanning isn't slow, it's just doing a bunch of work in your specific build. I don't have any concrete ideas on how to make it faster or delay some of the work. Suggestions are very much welcome.

nojb commented 1 month ago

we have in fact sped up reading and writing of files and that let to a noticeable speed up in dune internally.

We've fixed them in our fork,

@rgrinberg out of curiosity, whose fork is this? Jane Street? Tarides? Are these improvements going to be merged back upstream eventually? Thanks!

rgrinberg commented 1 month ago

JS. We're planning to merge them bit by bit starting this year.

jchavarri commented 1 month ago

So in total, we're looking at 19 seconds instead of 38. I doubt that will make you happy.

It definitely does make me happy :) That would be a great improvement. I think if we combine that with some specific internal actions as well (mostly simplify the amount of explicit rule stanzas we generate) it'd be great if we can reach times ~10s eventually.

Looking forward to test any of the improvements as they get merged upstream. Thanks!