Open hawkett opened 2 years ago
Bringing over from zulipchat discussion...
The git ls-tree
does show 120000
for symlinks (on MacOS -- need to confirm behavior on Windows), so one approach might be to filter out these files from the list before calling git hash-object
.
Then we could do one of the following:
git hash-object
individually on each symlink, discarding errors but honoring files that return a hash.I can see the potential usefulness of in-repo symlinking; for example, you could symlink apps/fancy-app/infra/_build.yaml
to common/config/azure-pipelines/fancy-app/_build.yaml
. This lets you keep the github codeowner of _build.yaml
belonging to a build team, but still counts changes to that file as a modification to fancy-app
(which might then force a rebuild of that project during PR build to ensure it still built properly).
Symlinking from in-repo to out-of-repo doesn't seem very deterministic (basically pointing at a random file location which might or might not exist on the user's hard drive), but I think it hangs together logically for Rush anyway... say you point at ~/app.json
. People with the same file contents of that file get the same hash value, people who don't have any file there get a different value (if we go with Option 2 or Option 3 anyway), and modifying the file does change the hash value, which apparently is what the developer wants in that scenario.
git hash-object
is only called on items that show up in git status
, so this error should only occur if you have locally modified the symbolic link in question and not committed the result. If the symlinks are managed by git, workaround for now is to just commit the change to the symlink, then Rush will be happy.
I'd be open to ignoring symbolic links in the state analysis algorithm. This would simply require the call to git status
to account for files that are symbol links and omit them from the parsed result.
@hawkett - What would the desired behavior in your scenario be?
Explicitly excluding symlinked directories when hashing would be ideal. The only argument against this seems to be the performance hit in checking whether a symlink is a directory on repos with a significant number of symlinks (outside .gitignore
). This feels like an uncommon edge case, but maybe large numbers of git managed symlinks is more common than I imagine. This behaviour also seems to match the implementation in the turborepo issue linked above, which would mean a migration to rush had one less gotcha.
Hello there,
I have been facing the same problem (#4475) due to git hash-object
trying to hash symlinked directories. In my case, not only it prevented incremental analysis but also it made the process crash (so no chance to keep going without the analysis), but I'll fix that error.
Regarding the "symlinks trying to be hashed" issue, what is the current status? It seems that everyone there agrees on the solution of ignoring/discarding symlinks from the analysis.
I'd be open to ignoring symbolic links in the state analysis algorithm. This would simply require the call to git status to account for files that are symbol links and omit them from the parsed result.
Maybe I could try to take the lead about that implementation if you guys are busy on other things?
Summary
If a repo contains symlinked folders that are not in
.gitignore
, the diff/hashing process fails, becausegit hash-object
cannot hash directories or symlinks to directories. The impact is that monorepos with git managed symlinks cannot take advantage of incremental builds, because rush attempts to hash every git managed file (including symlinks).In this case the structure above is used to support building Docker containers using projects in the monorepo.
~Same issue in Turborepo: https://github.com/vercel/turborepo/issues/531
Zulip discussion
Repro steps
rush build
.@microsoft/rush
globally installed version?rushVersion
from rush.json?useWorkspaces
from rush.json?node -v
)?