Open joker-777 opened 2 years ago
Hello @joker-777, thanks for trying Propshaft and raising the issue.
I'm not familiar with LiveReload, but from Propshaft side, it adds an extra before_action
to the application controller of your app, that checks for changes and clears the cache when you make a new request. I think its possible that with a large enough number of assets, a second reload is going to hit while the cache of the first reload is being refreshed and this might cause the race condition you are seeing. I'll see what I can do.
As for production, assets:clear
will keep at least three versions of each asset, and all that were created in the last hour. Unless you are deploying changes that touch all assets multiple times per hour, it should not be a problem. Also: production uses a manifest file, so it does not read files and calculating their digests dynamically.
Hi @brenogazzola! Thank you! I work with @joker-777 and we investigated this issue together.
We would really appreciate if you can look into it, and we're also happy to try out any ideas. As we mentioned, reproducing it is quite tricky.
As for production, yes the manifest keeps track of versions of assets and clears them on assets:clear
. However assets that have a .digested
in the filename aren't tracked or cleared by propshaft. We currently run a manual script to try to clear those up, but it's a bit brittle, because there's no easy way to know what were the last 3 versions, and we have to rely on created or updated timestamps of the files on the filesystem. This is a crude tool (e.g. whatever the threshold we set for "old", if we don't deploy for longer than the threshold, we might risk pruning digested files that are still in-use).
we're also happy to try out any ideas
My two working ideas are using a semaphore when cache is reloading (which is going to have an impact on performance), or figuring out how to cache bust only the modified files (which makes code more complex). Need to find time to test each of them.
However assets that have a
.digested
in the filename aren't tracked or cleared by propshaft.
You mean from builds
or public/assets
? The builds
folder needs to be commited, but it's contents should be in .gitignore
. As for public/assets
, that's definitely a bug. There was a recent change related to .digested
and we must have missed it. I'll check.
This is a crude tool
It is however how assets:clean
work. It will be at least 3 versions of every asset, but as many as there were in the past 1 hour, all based in the mtime
.
My two working ideas are using a semaphore when cache is reloading (which is going to have an impact on performance), or figuring out how to cache bust only the modified files (which makes code more complex). Need to find time to test each of them.
I'm not sure about the more complex cache-busting, but I'd give semaphore a try, at least to see if it solves the issue we're seeing. I'll keep you posted if I find anything. That's a great idea! Thank you @brenogazzola.
About the .digested
files, sorry. I explained the problem all wrong. Let me try to explain again :)
With "normal" assets, let's say init.js
, propshaft automatically creates a digest and stores the file with the digest. So on production, you might end up with init-aaaaa....js
and init-bbbbb....js
etc. Then the cleanup process will map init.js to the two digested files. This allows propshaft to then clean up older versions from the filesystem.
With ".digested" assets, propshaft gets an already-digested file, and doesn't apply another digest to it, so we might produce, e.g. loader-abcdef.digested.js
and then during the next deploy loader-fedcba.digested.js
. Propshaft does not extract the digest and logical part out of those files.
pry(main)> extract_path_and_digest("/app/loader-49b7fe393bf868bc53e952ba59a1120b037488edf92d60c70a9148ede3cdb326.digested.js")
=> ["/app/loader-49b7fe393bf868bc53e952ba59a1120b037488edf92d60c70a9148ede3cdb326.digested.js", nil]
pry(main)> extract_path_and_digest("/app/init-7324c15680c90f1782e2e2df9038c0c39b271a5d047745c84972e6f8635a4ddf.js")
=> ["/app/init.js", "7324c15680c90f1782e2e2df9038c0c39b271a5d047745c84972e6f8635a4ddf"]
This means that each digested
file is a snowflake, and therefore never gets cleaned up, because there's only one current
version of it. Our own crude tool is to just look for older digested
files and delete them, but that's a foot-gun in many situations. I don't have a solution for this, but it's just something that we spotted as a limitation of the cleanup process when using digested assets. It also causes the cached assets to balloon over time in development.
Hope this makes sense? I'm not that familiar with propshaft yet, so I hope I didn't mess this up again :)
Yep! adding a mutex around the cache cleanup seems to solve the issue (fingers-crossed). I have very limited experience with concurrency in general, and basically zero with Ruby concurrency, but I simply wrapped the cleanup code in a mutex and it seems to do the trick
class Propshaft::LoadPath
# ...
def initialize(paths = [], version: nil)
@paths = dedup(paths)
@version = version
@mutex ||= Mutex.new
end
# ...
def clear_cache
@mutex.synchronize do
@cached_assets_by_path = nil
assets_by_path # seeds the cache
end
end
This is a crude tool (e.g. whatever the threshold we set for "old", if we don't deploy for longer than the threshold, we might risk pruning digested files that are still in-use).
@gingerlime wouldn’t it be safer to parse the .manifest.json
file and keep the files that are in there and delete all the rest?
@tsrivishnu This is an option but then it would only keep one version. Shouldn't it keep at least the versions of the previous deployment?
@joker-777 I think it's doable with these two options:
If you use Capistrano, it keeps (or can keep) backups of asset manifest in assets_manifest_backup
in each release folder. I suppose, we could go through all the kept releases and use their backed up manifest. I see the following every time I deploy:
00:28 deploy:assets:backup_manifest
01 mkdir -p /home/xxx/apps/xxx/releases/20230530111226/assets_manifest_backup
✔ 01 xx@app.stage.xxxx.xx 0.084s
02 cp /home/xxx/apps/xxx/releases/20230530111226/public/assets/.manifest.json /home/xxx/apps/xxx/releases/20230530111226/assets_manifest_backup
✔ 02 xx@app.stage.xxxx.xx 0.071s
public/assets
manifest-dcca4ae2c9da08d55acf8dfdc7524cc89f328b72.json
manifest-f1378d3c96438a5dee65f98c16c9233536e3f602.json
I think it's worth noting that in the option 2, the manifest json that is versioned and kept is different from the .manifest.json
. I suppose those are generated by Webpack while .manifest.json
is by propshaft. I haven't fully tested the options. But looks like option 2 can be used for pre-digested assets.
We don't use capistrano. Versioning the manifest file could be a solution though.
@gingerlime Could you try https://github.com/rails/propshaft/issues/110 and ensure that still works for you?
Re: production accumulation of digest chunks, is this an issue because you're using Heroku or something? With a containerized deployment, like Kamal, each container should be starting with a fresh build.
Thanks for following up on this. I’m no longer working on this codebase but @joker-777 should be able to comment on this issue.
@dhh Thanks, I will try to test this as soon as possible.
@dhh The link in your comment doesn't work, but I guess you meant version 0.9.0. We haven't seen this error for a long time. It may be because we don't use LiveReload anymore. But we will start using 0.9.0 from now on and let you know if we find any problems.
Did you also find a solution for the *.digested.js
files?
I did mean 0.9, yes. If you're talking about cleanup of digested files, I don't know that there is a straight shot there. It's also not a problem that affects modern containerized deployments, since they'll have a clean build every time.
Hi, first of all, thanks a lot for developing this new solution. We are using it in production and are quite happy. One issue we bump into quite frequently though is in development.
We use webpack and depend on dynamic chunk loading. That means that webpack needs to compile these chunks already with a hash and a ".digested" suffix. We also use LiveReload from webpack to refresh the page after each compilation automatically.
What we observed is that when we apply multiple changes in succession, then a dynamic chunk request might return a 410. We debugged the code and could see that the asset wasn't in the
@cached_assets_by_path
. We also saw that propshaft clears the cache when any of the files inapp/assets/builds
changes. We suspect that there is some kind of race-condition between webpack and propshaft, clearing the cache, compiling multiple times and refreshing the browser at the same time.We also noted that the digested assets created by webpack accumulate over time which could also slow down the creations of
cached_assets_by_path
. This accumulation happens also in production and doesn't get cleared byassets:clear
.We tried to find a reproducible example but since it is likely a race condition it is very hard. We would appreciate any thoughts and pointers.