microsoft / scalar

Scalar: A set of tools and extensions for Git to allow very large monorepos to run on Git without a virtualization layer
MIT License
1.39k stars 63 forks source link

too many open files #415

Closed ptarjan closed 3 years ago

ptarjan commented 4 years ago

I'm worried that Scalar is causing me to not be running git gc enough. Is there something I can do to check?

The symptom I'm seeing, is after a while, I get this failure when git pushing:

$ git push
...
ref <sha>:: open <path>git/objects/pack/pack-<sha>.pack: too many open files

Then if I run

$ git gc
Enumerating objects: 1395635, done.
Counting objects: 100% (1395635/1395635), done.
Delta compression using up to 12 threads
Compressing objects: 100% (325900/325900), done.
Writing objects: 100% (1395635/1395635), done.
Total 1395635 (delta 938477), reused 1390778 (delta 933690), pack-reused 0
Checking connectivity: 1395929, done.

It starts to work again.

I do see scalar running in my repo:

$ scalar list
/Users/paul.tarjan/robinhood/rh-staging
/Users/paul.tarjan/robinhood/rh
derrickstolee commented 4 years ago

Hi @ptarjan. Thanks for pointing out this issue. If it happens again, could you run ls -al .git/objects/pack?

As reported elsewhere, there is some thought that on "small" repos, the 2-gigabyte batch-size in the PackfileMaintenanceStep is not repacking pack-files often enough. I will be updating this behavior to work better for repos of this size.

derrickstolee commented 4 years ago

Here is the Git contribution I was thinking about.

ptarjan commented 4 years ago

Thanks for the response. In my case, my repo is our huge company monorepo. So 4GB .git directory, 100k files, 100k commits, so I doubt I'm hitting the small repo problem.

My ulimit -n is 256 (no idea why, maybe my company sets that default) maybe that's what's causing this?

derrickstolee commented 4 years ago

Thanks for the clarification, @ptarjan. 4GB is on the low end of my "medium-size" scale, compared to repos that I'm used to. ;)

The "small repo" problem I'm talking about is that your incremental fetches are much smaller than 2 GB. I bet that if you list your pack-files (some time after your recent gc) you'll have one "big" pack-file and many smaller pack-files. Those smaller pack-files are too small to trigger the 2GB batch size of the git multi-pack-index repack command that Scalar uses.

Yes, a ulimit of 256 is incredibly low. It is typically 10x that.

ptarjan commented 4 years ago

Hey! Sorry for dropping the ball here. We're now deploying this to our whole org so we're actually getting more reports from our users about it.

It seems the default ulimit is 256? We tried on a personal vanilla OS X 10.15.6 and it was also 256.

If that is the case for mac, is there some other git command that Scalar can do which would help here? It seems git gc resolves the issue every time so should we just cron that onto people's machines? It seems that if Scalar is already trying to maintain the state of the repo, that this would be Scalar's job?

derrickstolee commented 4 years ago

Hey! Sorry for dropping the ball here. We're now deploying this to our whole org so we're actually getting more reports from our users about it.

It seems the default ulimit is 256? We tried on a personal vanilla OS X 10.15.6 and it was also 256.

I'm hesitant to push for updating ulimit here, but that might be the best option for now.

If that is the case for mac, is there some other git command that Scalar can do which would help here? It seems git gc resolves the issue every time so should we just cron that onto people's machines? It seems that if Scalar is already trying to maintain the state of the repo, that this would be Scalar's job?

If you have the ability to do that, then that's a good short-term option. The next version of Git will include the update to git multi-pack-index repack (here it is as a commit in master), so Scalar's incremental repack strategy should suffice when we ship that version.

Thank you for your patience here! We appreciate the feedback.

github-actions[bot] commented 3 years ago

Labeling this issue as stale. There has been no activity for 30 days. Remove stale label or comment or this issue will be closed in 7 days.

ptarjan commented 3 years ago

This is still an issue for us

On Sun, Nov 29, 2020 at 6:03 AM github-actions[bot] < notifications@github.com> wrote:

Labeling this issue as stale. There has been no activity for 30 days. Remove stale label or comment or this issue will be closed in 7 days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/scalar/issues/415#issuecomment-735399796, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJZT75Q23K4RJRSMREYU3SSJIDPANCNFSM4P2L7EWA .

derrickstolee commented 3 years ago

This is still an issue for us

Have you upgraded to Git 2.29.0 or later? If you haven't, then please do.

Have you run scalar run pack-files on your repositories? If you haven't, then run ls -al .git/objects/pack before and after. Then run it again. Let me know what you see.

ptarjan commented 3 years ago

Thanks for the advice. I'll get folks that complain about this to upgrade to git 2.29.0. I just did it myself to keep an eye on it.

Should I have to run scalar run pack-files regularly or is this an exceptional thing? I just ran it and it did remove a few files from that directory but there are still lots. Is that a problem?

derrickstolee commented 3 years ago

Thanks for the advice. I'll get folks that complain about this to upgrade to git 2.29.0. I just did it myself to keep an eye on it.

Should I have to run scalar run pack-files regularly or is this an exceptional thing? I just ran it and it did remove a few files from that directory but there are still lots. Is that a problem?

The Scalar service runs the pack-files step about once a day. If you want to accelerate its behavior, you can run it a few times in a row. It runs daily to avoid over-taxing the user with CPU time in the background. Also, if concurrent foreground commands are running across two runs of the pack-files step, then those foreground commands could fail due to expecting a pack-file that was deleted out from under them. If you are in control and not simultaneously running Git commands, then repeated runs of scalar run pack-files should incrementally improve your .git/objects/pack directory until reaching a stable state.

If you are not reaching a stable state, or are otherwise unhappy with the end result, then I'd love to see the output of ls -al .git/objects/pack to understand the size of your pack-files and how that might be affecting our batch-size calculation.

ptarjan commented 3 years ago

Thanks for all the debugging help. I do have a instance of where a user had:

$ git --version
git version 2.29.2
$ ulimit -n
256

but was getting the symptom:

ref 8235f832bfb63026ab64c70bab1febb0c4b85b28:: open <path>/.git/objects/pack/pack-fe386ebacdbae3b584fe256738f8d90c11053b51.pack: too many open files

and git gc fixed it for them. Sadly I wasn't able to get them to try scalar run pack-files before they ran git gc.

So I don't think that git 2.29.0 is enough to mitigate this sadly. I'll keep pining anytime one of my users notices this.

derrickstolee commented 3 years ago

Thanks for all the debugging help. I do have a instance of where a user had:

$ git --version
git version 2.29.2
$ ulimit -n
256

but was getting the symptom:

ref 8235f832bfb63026ab64c70bab1febb0c4b85b28:: open <path>/.git/objects/pack/pack-fe386ebacdbae3b584fe256738f8d90c11053b51.pack: too many open files

and git gc fixed it for them. Sadly I wasn't able to get them to try scalar run pack-files before they ran git gc.

So I don't think that git 2.29.0 is enough to mitigate this sadly. I'll keep pining anytime one of my users notices this.

Unfortunately, the feature I was hoping would land in 2.29.0 (git maintenance run --task=incremental-repack) did not actually land until 2.30.0. It is probably enabled in v2.29.0.vfs.0.0 or later.

As of the latest Scalar release on macOS, we have these two changes:

  1. scalar run <task> essentially just runs git maintenance run --task=<task> (with some mapping from the Scalar names to the Git names, but the functionality is very similar). This should have been improved to prevent this "too many files" case, at least usually.

  2. The Scalar.Service application is removed in favor of running git maintenance start. This takes a feature that only recently landed in git/git's master branch, so it will be in core Git in 2.31.0 but is not in 2.30.0.

ptarjan commented 3 years ago

This just happened to me personally. Then by running:

$ git maintenance run --task=incremental-repack
Enumerating objects: 89559, done.
Counting objects: 100% (89559/89559), done.
Delta compression using up to 12 threads
Compressing objects: 100% (63722/63722), done.
Writing objects: 100% (89559/89559), done.
Total 89559 (delta 23834), reused 83651 (delta 17987), pack-reused 0

it resolved it.

I have:

$ scalar --version
scalar 20.10.178.7
$ git --version
git version 2.30.0
github-actions[bot] commented 3 years ago

Labeling this issue as stale. There has been no activity for 30 days. Remove stale label or comment or this issue will be closed in 7 days.

ptarjan commented 3 years ago

I haven't seen this happen when any of our users when using the new version of git. Thanks for all your hard work on this!

ptarjan commented 3 years ago

Sadly I have to reopen this ticket. I rolled out the latest scalar version (21.03.185.1) to our fleet and now my users are reporting they are seeing Too many open files often. We all have git version 2.31.0.vfs.0.1. Is there some diagnostics I can collect for you when it happens to my users?

ptarjan commented 3 years ago

Why did this close?