thheller / shadow-cljs

ClojureScript compilation made easy
https://github.com/thheller/shadow-cljs
Eclipse Public License 1.0
2.23k stars 173 forks source link

How to debug the par-compiler for :cache-level :all #1179

Closed sirmspencer closed 1 month ago

sirmspencer commented 2 months ago

I am getting build errors when trying to use :cache-level :all. How do I debug this? This example isn't our code, so maybe an issue with devtools? It does appear to work ok if I disable devtool in the preloads.

[:browser-test] Build failure:
aborted par-compile, [:shadow.build.classpath/resource "devtools/defaults.cljs"] still waiting for #{cljs.core}
{:aborted [:shadow.build.classpath/resource "devtools/defaults.cljs"], :pending #{cljs.core}}
ExceptionInfo: aborted par-compile, [:shadow.build.classpath/resource "devtools/defaults.cljs"] still waiting for #{cljs.core}
    shadow.build.compiler/par-compile-one (compiler.clj:1223)
    shadow.build.compiler/par-compile-one (compiler.clj:1188)
    shadow.build.compiler/par-compile-cljs-sources/fn--6360/iter--6382--6386/fn--6387/fn--6388/fn--6389 (compiler.clj:1306)
    clojure.core/apply (core.clj:667)
    clojure.core/with-bindings* (core.clj:1990)
    clojure.core/with-bindings* (core.clj:1990)
    clojure.core/apply (core.clj:671)
    clojure.core/bound-fn*/fn--5818 (core.clj:2020)
    java.util.concurrent.FutureTask.run (FutureTask.java:317)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1144)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:642)
    java.lang.Thread.run (Thread.java:1583)
thheller commented 2 months ago

Why is :cache-level :all in this question? It is the default. Do you set it specifically or mess with other caching options in some way?

par-compile is the parallel compilation that tries to spread tasks accross multiple threads. Since in CLJS compilation of namespaces can only be done once all its dependencies have been compiled it has to wait. So the error message is saying that it tried to compile "devtools/defaults.cljs" but was still waiting for cljs.core to finish. Seems like you gathered as much.

So, strictly speaking it was cljs.core that had the problem and I don't have a guess for why that might be. Progress is tracked per form, so it seems unlikely that something would block for 30 sec (the default timeout) in that namespace without making progress.

Are you messing with your clojurescript dependency on some way? local or git dep maybe? Anything else out of the ordinary?

Can you post the full log when setting :log {:level :debug} in shadow-cljs.edn top level (not build config, so same level as :builds). Then npx shadow-cljs compile browser-test --verbose? Maybe there is a clue in there somewhere.

sirmspencer commented 2 months ago

Why is :cache-level :all in this question? It is the default. Do you set it specifically or mess with other caching options in some way?

Our project set it to :jars. I wasn't there for that decision at the time. I'm looking to go back to it, and the devtools preloads seems to be hanging.

The project is pretty large + npm modules and that makes the verbose output too long to capture. All I see is cache reads and the same error above.

This might all be random tough. I see a random different set of our app cljs files failing when I built today. Then as soon as I added :log {:level :debug} there were no errors. Even after taking it back out and switching between :jars and :all back and forth I can't get the error again. Maybe there is something that gets cached during hmr with :jars

Is there a way to force a clean build?

Also side note, :jars looks like might even be faster in some cases. All the time saved with a cache read is taken by a cache write. IDK if maybe any of that can be done async to help with build time.

thheller commented 2 months ago

If it is failing do you see it getting stuck at Cache read?

The verbose log is basically showing every "progress event" that is happening. So, say it is failing for cljs/core.cljs, is the last log entry what you see -> Cache read: ... or -> Compile CLJS? -> it started, <- means it finished (and logs how long it took in addition). Important to know wheter it gets stuck at the cache read or compile.

Only guess I have that maybe you have a weird/outdated/custom transit version that somehow messes with caches? That would only apply if it gets stuck at Cache read though.

In either cache I do not see how :cache-level is at all relevant for any of this. As it was failing for cljs.core which is in a jar, so would be cached the same as :jars or :all.

A clean build you can force by rm -rf .shadow-cljs/builds.

Is there maybe something wrong with your disk? Cache read/write is usually faster than compiling from scratch, usually by a lot.

For example if I run a compile with a proper cache I get

-> Cache read: cljs/core.cljs
<- Cache read: cljs/core.cljs (25 ms)

Without

<- Compile CLJS: cljs/core.cljs (1644 ms)
-> Cache write: cljs/core.cljs
<- Cache write: cljs/core.cljs (253 ms)

So in total that is ~275ms for caching (and only ~25ms for subsequent reads), where it would be ~1600ms if skipping was skipped entirely. And all of this is already done async/parallel wherever possible, i.e. compilation doesn't wait for cache to be written.

That is of course on my machine. Totally possible that it looks different for you? All depends on your actual hardware? Where all the output is written to? Maybe running in a Container? Maybe writing to some network drive or so?

thheller commented 2 months ago

Oh and I forgot to ask ... which shadow-cljs version do you use? Thats probably the most important bit. If it is some ancient history thing maybe try upgrading first. It has been a while since the cache stuff was changed in any way, but there definitely were issues if you are on something super old.

sirmspencer commented 1 month ago

2.28.3

sirmspencer commented 1 month ago

I can't get the error to replicate anymore so it must have just been a odd state it got in.

For build time this is with :cache :all. shadow seems pretty good at only compiling the files that it needs to on hot reload. Its never reading from the cache on hot reload. But then it also adds a write that makes the build time longer.

[:app] Compiling ...
-> Resolving Module: :app
<- Resolving Module: :app (79 ms)
-> Resolving Module: :app-views
<- Resolving Module: :app-views (60 ms)
-> Resolving Module: :authorize
<- Resolving Module: :authorize (53 ms)
-> Resolving Module: :page-container
<- Resolving Module: :page-container (57 ms)
-> Resolving Module: :task-detail-view
<- Resolving Module: :task-detail-view (73 ms)
-> build target: :browser stage: :compile-prepare
<- build target: :browser stage: :compile-prepare (35 ms)
-> Compile CLJS: //network_connections/card.cljs
<- Compile CLJS: //network_connections/card.cljs (10 ms)
-> Cache write: //network_connections/card.cljs
-> Compile CLJS: //views/core.cljs
-> Compile CLJS: //xx/views.cljs
<- Compile CLJS: //views/core.cljs (18 ms)
-> Cache write: //views/core.cljs
<- Compile CLJS: //xx/views.cljs (35 ms)
-> Cache write: //xx/views.cljs
<- Cache write: //network_connections/card.cljs (625 ms)
<- Cache write: //views/core.cljs (658 ms)
<- Cache write: //xx/views.cljs (654 ms)
-> build target: :browser stage: :compile-finish
-> build target: :browser stage: :flush
-> Flushing unoptimized modules
-> Flush: shadow$empty.js
-> Flush: //xx/views.cljs
-> Flush: shadow/module/page-container/append.js
-> Flush: shadow/module/task-detail-view/prepend.js
<- Flush: shadow/module/app-views/prepend.js (4 ms)
<- Flush: shadow/module/app-views/append.js (5 ms)
-> Flush: //views/core.cljs
<- Flush: shadow/module/authorize/prepend.js (5 ms)
<- Flush: shadow/module/authorize/append.js (8 ms)
<- Flush: shadow/module/app/append.js (8 ms)
<- Flush: shadow/module/page-container/append.js (12 ms)
<- Flush: shadow$empty.js (18 ms)
<- Flush: //network_connections/card.cljs (12 ms)
<- Flush: shadow/module/task-detail-view/append.js (7 ms)
<- Flush: shadow.js.shim.module$shepherd_js$default.js (12 ms)
<- Flush: //xx/views.cljs (13 ms)
<- Flush: //views/core.cljs (9 ms)
<- Flushing unoptimized modules (343 ms)
<- build target: :browser stage: :flush (1136 ms)
<- build hook: [1 shadow.html/copy-file] stage: :flush (0 ms)
[:app] Build completed. (4247 files, 3 compiled, 0 warnings, 2.72s)

and with :cache :jar, it looks the same but no cache writes

-> Resolving Module: :app
<- Resolving Module: :app (127 ms)
-> Resolving Module: :app-views
<- Resolving Module: :app-views (62 ms)
-> Resolving Module: :authorize
<- Resolving Module: :authorize (57 ms)
-> Resolving Module: :page-container
<- Resolving Module: :page-container (56 ms)
-> Resolving Module: :task-detail-view
<- Resolving Module: :task-detail-view (58 ms)
-> build target: :browser stage: :compile-prepare
<- build target: :browser stage: :compile-prepare (42 ms)
-> Compile CLJS: //cards/network_connections/card.cljs
-> Compile CLJS: //xx/views.cljs
-> Compile CLJS: //views/core.cljs
<- Compile CLJS: //cards/network_connections/card.cljs (28 ms)
<- Compile CLJS: //views/core.cljs (22 ms)
<- Compile CLJS: //xx/views.cljs (48 ms)
-> build target: :browser stage: :compile-finish
<- build target: :browser stage: :compile-finish (0 ms)
-> build target: :browser stage: :flush
-> Flush: shadow$empty.js
-> Flushing unoptimized modules
-> Flush: //xx/views.cljs
-> Flush: shadow.js.shim.module$shepherd_js$default.js
-> Flush: shadow/module/authorize/append.js
-> Flush: shadow/module/page-container/append.js
-> Flush: shadow/module/task-detail-view/prepend.js
<- Flush: shadow/module/app-views/prepend.js (2 ms)
-> Flush: //views/core.cljs
<- Flush: shadow/module/page-container/append.js (3 ms)
-> Flush: shadow/module/task-detail-view/append.js
<- Flush: shadow/module/task-detail-view/prepend.js (3 ms)
<- Flush: shadow/module/page-container/prepend.js (3 ms)
<- Flush: shadow/module/app/append.js (5 ms)
<- Flush: shadow/module/authorize/prepend.js (5 ms)
<- Flush: shadow$empty.js (12 ms)
<- Flush: shadow.js.shim.module$shepherd_js$default.js (6 ms)
<- Flush: shadow/module/task-detail-view/append.js (2 ms)
<- Flush: //xx/views.cljs (6 ms)
<- Flush: //views/core.cljs (4 ms)
<- Flushing unoptimized modules (696 ms)
<- build target: :browser stage: :flush (1380 ms)
-> build hook: [1 shadow.html/copy-file] stage: :flush
<- build hook: [1 shadow.html/copy-file] stage: :flush (0 ms)
[:app] Build completed. (4247 files, 3 compiled, 0 warnings, 2.57s)
thheller commented 1 month ago

Those are some excessively slow write times overall? Its not just the cache, this is also much slower than normal.

<- Flushing unoptimized modules (696 ms)
<- build target: :browser stage: :flush (1380 ms)

Given the compile times of your namespaces it seems they aren't very big, so I don't know why writing the cache would take that long.

You never answered my question about which environment this is running in, so I'm going to assume this is either running on some kind of Container (e.g. Docker, VM) with constrained resources or just a slow disk/computer in general. Or your OS messing around with things. Could be some anti-virus stuff trying to scan the files as they are written. In general there is just too many factors that are way outside shadow-cljs control that can affect this.

Also, cache writes do not make the build any meaningfully slower. As I said they happen async and in parallel. hot-reload also doesn't wait for them.

[:app] Build completed. (4247 files, 3 compiled, 0 warnings, 2.72s)
[:app] Build completed. (4247 files, 3 compiled, 0 warnings, 2.57s)

As you can see the first one would be much slower if the 3 extra cache writes actually held anything back.

If you prefer you can just run with jar-only caching, with those slow write times it will probably give you disk (or whatever else) more of break,

thheller commented 1 month ago

Well, now I see why the flush I mentioned takes so long. 4247 files is a lot and that will take some time to write I guess. So, my guess is that the disk is just overloaded from all the writing.

sirmspencer commented 1 month ago

Sorry, missed the question about environment. I have the 16gb macbook pro 2019.

sirmspencer commented 1 month ago

The cache writes look async parallel during the compile stage but still blocking before flush.

thheller commented 1 month ago

Well yeah, it is all happening in a fixed size threadpool. So, at some point it the resources are all busy there is going to be some queuing. I assume the macbook has a ssd, so this seems rather slow for that. Did you check if its maybe dying? Or maybe spotlight/icloud doing something with the cache/output files? There have been reports of others that project folders that get synced to icloud cause a lot of unnecessary work.

sirmspencer commented 1 month ago

I have all that turned off. Could be IT crap. Its all good though. Thanks for looking at it with me.