microsoft / rushstack

Monorepo for tools developed by the Rush Stack community
https://rushstack.io/
Other
5.76k stars 588 forks source link

[rush] directory symlinks passed to git hash-object cannot be hashed #3517

Open hawkett opened 2 years ago

hawkett commented 2 years ago

Summary

If a repo contains symlinked folders that are not in .gitignore, the diff/hashing process fails, because git hash-object cannot hash directories or symlinks to directories. The impact is that monorepos with git managed symlinks cannot take advantage of incremental builds, because rush attempts to hash every git managed file (including symlinks).

Error calculating the state of the repo. (inner error: Error: git hash-object exited with status 128: fatal: Unable to hash apps/infra/containers/containerA/<symlink-to-repo-project>
). Continuing without diffing files.

In this case the structure above is used to support building Docker containers using projects in the monorepo.

~Same issue in Turborepo: https://github.com/vercel/turborepo/issues/531

Zulip discussion

Repro steps

Question Answer
@microsoft/rush globally installed version? 5.75.0
rushVersion from rush.json? 5.75.0
useWorkspaces from rush.json? true
Operating system? Linux
Would you consider contributing a PR? No
Node.js version (node -v)? 16.15.1
elliot-nelson commented 2 years ago

Bringing over from zulipchat discussion...

The git ls-tree does show 120000 for symlinks (on MacOS -- need to confirm behavior on Windows), so one approach might be to filter out these files from the list before calling git hash-object.

Then we could do one of the following:

I can see the potential usefulness of in-repo symlinking; for example, you could symlink apps/fancy-app/infra/_build.yaml to common/config/azure-pipelines/fancy-app/_build.yaml. This lets you keep the github codeowner of _build.yaml belonging to a build team, but still counts changes to that file as a modification to fancy-app (which might then force a rebuild of that project during PR build to ensure it still built properly).

Symlinking from in-repo to out-of-repo doesn't seem very deterministic (basically pointing at a random file location which might or might not exist on the user's hard drive), but I think it hangs together logically for Rush anyway... say you point at ~/app.json. People with the same file contents of that file get the same hash value, people who don't have any file there get a different value (if we go with Option 2 or Option 3 anyway), and modifying the file does change the hash value, which apparently is what the developer wants in that scenario.

dmichon-msft commented 2 years ago

git hash-object is only called on items that show up in git status, so this error should only occur if you have locally modified the symbolic link in question and not committed the result. If the symlinks are managed by git, workaround for now is to just commit the change to the symlink, then Rush will be happy. I'd be open to ignoring symbolic links in the state analysis algorithm. This would simply require the call to git status to account for files that are symbol links and omit them from the parsed result.

iclanton commented 2 years ago

@hawkett - What would the desired behavior in your scenario be?

hawkett commented 2 years ago

Explicitly excluding symlinked directories when hashing would be ideal. The only argument against this seems to be the performance hit in checking whether a symlink is a directory on repos with a significant number of symlinks (outside .gitignore). This feels like an uncommon edge case, but maybe large numbers of git managed symlinks is more common than I imagine. This behaviour also seems to match the implementation in the turborepo issue linked above, which would mean a migration to rush had one less gotcha.

antoine-coulon commented 6 months ago

Hello there,

I have been facing the same problem (#4475) due to git hash-object trying to hash symlinked directories. In my case, not only it prevented incremental analysis but also it made the process crash (so no chance to keep going without the analysis), but I'll fix that error.

Regarding the "symlinks trying to be hashed" issue, what is the current status? It seems that everyone there agrees on the solution of ignoring/discarding symlinks from the analysis.

I'd be open to ignoring symbolic links in the state analysis algorithm. This would simply require the call to git status to account for files that are symbol links and omit them from the parsed result.

Maybe I could try to take the lead about that implementation if you guys are busy on other things?