robfig / glock

MIT License
230 stars 27 forks source link

sync: Limit number of concurrent subprocesses #34

Closed bdarnell closed 8 years ago

bdarnell commented 8 years ago

Our GLOCKFILE has gotten big enough that we hit file descriptor limits while syncing.

Fixes cockroachdb/cockroach#8361

tamird commented 8 years ago

@robfig ping.

robfig commented 8 years ago

Thank you!!!

clofresh commented 8 years ago

It looks like this PR is creating a race condition. When I do a sync with a glockfile of 83 repos:

glock sync github.com/myrepo/myapp
# cd /home/carlo/go/src/github.com/spf13/pflag; git rev-parse HEAD
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions
HEAD

But then afterwards I can do this:

cd /home/carlo/go/src/github.com/spf13/pflag; git rev-parse HEAD
f676131e2660dc8cd88de99f7486d34aa8172635

When I revert this PR, I don't get the error anymore.

tamird commented 8 years ago

@clofresh have you tried to debug that at all? That error is emitted by git when the the repo doesn't yet have a HEAD (e.g. it doesn't have any commits), which...should never happen in the context of go get.

Is that all you see in the log?

Viq111 commented 8 years ago

Here is an extract of a GLOCKFILE that will fail:

github.com/naoina/toml 751171607256bb66e64c9f0220c00662420c38e9
github.com/pointlander/compress ca9c3d3844d3088e6d98441d6a9cc04731b54719
github.com/garyburd/redigo 836b6e58b3358112c8291565d01c35b8764070d7
github.com/pointlander/peg 443cd1b635f4e4ec7eb3c9d5cfb57d40423abd8e
github.com/spf13/pflag 11b7cf8387a31f278486eaad758162830eca8c73
github.com/pkg/errors a2d6902c6d2a2f194eb3fb474981ab7867c81505
github.com/syndtr/goleveldb 1a9d62f03ea92815b46fcaab357cfd4df264b1a0
github.com/gorilla/mux 854d482e26505d59549690719cbc009f04042c2e
github.com/stretchr/testify 8ce79b9f0b77745113f82c17d0756771456ccbd3
golang.org/x/sys 354f231ae1a9ca2d3a6a1a7c5d40b1213d761675
gopkg.in/vmihailenco/msgpack.v2 2f3337bc4c9ded23745439860363115f8a8e99e2
github.com/codegangsta/inject 33e0aa1cb7c019ccc3fbe049a8262a6403d30504
gopkg.in/fsnotify.v1 96c060f6a6b7e0d6f75fddd10efeaca3e5d1bcb0
github.com/gogo/protobuf 932b70afa8b0bf4a8e167fdf0c3367cebba45903
github.com/jmoiron/cobra b6faef729ead6f165684250d4e7fbefc949998a5
gopkg.in/inf.v0 3887ee99ecf07df5b447e9b00d9c0b2adaa9f3e4
gopkg.in/olivere/elastic.v2 0d6b5aab8bb2ac800bd5041b8d98a0a2e60a6f82
github.com/influxdata/toml 9b6538c65fda6d661eaac2097c6d878ab62afb7a
github.com/codeskyblue/go-sh ceb46ec4630a726eeed51d0a70066c9e1102c48f
github.com/cenkalti/backoff 4dc77674aceaabba2c7e3da25d4c823edfb73f99
gopkg.in/Shopify/sarama.v1 4faee6199919651f769e85fd97f1b3bf5739cb81
github.com/dustin/randbo 7f1b564ca7242d22bcc6e2128beb90d9fa38b9f0
github.com/bitly/go-hostpool d0e59c22a56e8dadfed24f74f452cea5a52722d2
github.com/pointlander/jetset b75aa0394920ad9de9bf0fea80a05892a5cb0273
github.com/golang/protobuf 34a5f244f1c01cdfee8e60324258cfbb97a42aec
github.com/getsentry/raven-go 86cd4063c535cbbcbf43d84424dbd5911ab1b818
github.com/goamz/goamz 53857a74b8790008af15ce2969b4f03f3b9771a1
github.com/cloudflare/golz4 ef862a3cdc58a6f1fee4e3af3d44fbe279194cde
github.com/samuel/go-zookeeper 041700315ab94d5f506317d8798adc9a6674c547
github.com/tinylib/msgp 0d29f4bb1d9c5d1ad8ad54e0507ca54c0fa50482
gopkg.in/olivere/elastic.v3 ea60f14deda82f645e0221894b3a9af6b0c76819
github.com/eapache/go-resiliency ed0319b32e66e3295db52695ba3ee493e823fbfe
github.com/klauspost/crc32 a5d1ea1c7bb4ad05a27df06491071dd6d45d1a08
github.com/boltdb/bolt afceb316b96ea97cbac6d23afbdf69543d80748a
github.com/bkaradzic/go-lz4 3a5b51d8591e948b8d8d58eb4bff9e8a66021f08
github.com/cloudfoundry/gosigar 0f1459513b0c78a09526ed0cf4d74c0d298d2d75
github.com/pquerna/ffjson 39a6b41ac09316969fe5efbad024dd066774e09b
github.com/gorilla/context 14f550f51af52180c2eefed15e5fd18d63c0a64a
github.com/bsm/redeo 436f4440f2ee191b0bfbdf9d12894fe0f1eedd9b

Note that since it's a race condition, it will not failed systematically (this one ^ will fail 1 out of 5 times (starting from an empty GOPATH)). On the 83-lines file this will fail most of the time.

The error doesn't seems to be linked to a particular repo but can happen for any of them. One of the failure here was for example:

ᐅ glock sync ...
# cd /Users/vianneytran/tests/gotest/src/github.com/gorilla/context; git rev-parse HEAD
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
HEAD
exit status 128
RaduBerinde commented 8 years ago

I have been occasionally running into that exact error for a while. I believe @andreimatei ran into it as well. It happens rarely, but it definitely happened before this PR (maybe less frequently). Just a data point.

bdarnell commented 8 years ago

We've been seeing this too (cockroachdb/cockroach#8578). My best guess is that it's not actually safe to call go get concurrently (see this post), but it happened to work before because by starting everything at once, we avoided some problematic interleavings of the commands.

I think it has to do with the recursion implied by go get: we list golang.org/x/net (for example) in our glockfile, but several of our dependencies also depend on it. Before, we would always start to initialize golang.org/x/net before any of the other packages could start their recursive fetch of it. Now that the top-level fetch of golang.org/x/net is delayed, we may get multiple packages earlier in the list trying to fetch it at once. If go get had a "no dependencies" flag, then glock could use it because all the dependencies should be listed in the glockfile, and then I think this error would go away because each go get call would only operate on its own package. Unfortunately, there is no such flag, so I'm not sure how to fix this without serializing everything or reproducing go get's internal logic. If you pass all the packages to go get at once, does it parallelize the downloads?

tamird commented 8 years ago

I think I have a fix for this - verifying now. It won't fix bad on-disk state created by this go get race, but it will prevent new instances of it.