google / syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer
Apache License 2.0
5.37k stars 1.23k forks source link

syz-ci: use a single check out per OS #1235

Open dvyukov opened 5 years ago

dvyukov commented 5 years ago

Currently if syz-ci manages a dozen of linux configurations, it creates a dozen of linux checkouts. This consumes lots of disk space and increases update time and addition of new instances. We don't need a checkout per instance, a single linux checkout with multiple remotes would be enough. I think we should create 1 checkout per OS type (linux and fuchsia are still in different dirs) just as jobs use. But then we will need to use an out-of-tree build (per manager). Then we also can't allow several instances to manipulate this checkout concurrently, thus it will make sense to move update/build of all instances into a single goroutine (all builds are semaphore-protected, so won't affect performance). Bonus points for allowing build dirs in tmpfs. Does it make sense? Do we have enough ram?

blackgnezdo commented 5 years ago

Presumably you mean (git repo == checkout) and (worktrees == remotes)? Yes, this sounds like a great idea as syz-ci would be responsible for managing the repo and syz-managers for managing their individual worktrees.

I don't know if keeping worktrees in tmpfs is a particularly attractive proposition. It's in hundreds of megs unpacked? You hopefully also don't build them often enough from the same set of sources to save on IO.

dvyukov commented 5 years ago

Presumably you mean (git repo == checkout) and (worktrees == remotes)?

Probably. Not sure what you mean by these terms :) There can be several git repos checked out in a single checkout (directory). So I called that check out. And by remotes I mean git remotes, that is fetching several git repos in the same checkout dir.

Yes, this sounds like a great idea as syz-ci would be responsible for managing the repo and syz-managers for managing their individual worktrees.

Several managers can even share the same tree (e.g. we have several for the main Linus tree).

I don't know if keeping worktrees in tmpfs is a particularly attractive proposition. It's in hundreds of megs unpacked? You hopefully also don't build them often enough from the same set of sources to save on IO.

I don't think we should place sources into tmpfs. I meant the build dirs with object files for out-of-tree builds. Sources are mostly read-only, object files are frequently written. As far as I understand placing object files into tmpfs can be beneficial from performance perspective. However, not sure if we can fit all of them into memory. A single checkout of linux kernel takes ~10-11 GBs now.

I did not know about worktrees. Reading the docs it may open some interesting possibilities. However, I've hit a problem in my first test:

linux$ git worktree add ~/src/linux-wt master
Preparing worktree (checking out 'master')
fatal: 'master' is already checked out at '~/src/linux'

Does this mean we can't checkout the same branch multiple times? Lots of instances need precisely the same branch, they just use different configs/etc. My main idea was saving on (1) precisely the same code being checked out several times, (2) similar trees that share most of commits checked out as completely separate repos (i.e. all of linux tree share most of commits).

dvyukov commented 5 years ago

fatal: 'master' is already checked out at '~/src/linux'

We could try to attach several own branches to the same upstream branch. E.g. if we need to checkout upstream/master several times, we could point 2 our branches, say, upstream-kasan and upstream-kasan-386 to upstream/master and then checkout these 2 branches in different worktrees. Need to figure out if it's worth it and if it all works.