Closed jackkoenig closed 5 years ago
Benchmarking results on an internal SiFive package with lots of dependencies
“wit status”
Dual-core Macbook Pro
master: 9.07 s +/- 0.52
passbyval: 8.20 s +/- 0.86
caching: 6.68 s +/- 0.72
Linux Server
master: 2.92 s +/- 0.28
passbyval: 1.94 s +/- 0.15
caching: 1.78 s +/- 0.19
Caching didn't help as much as I had hoped on the Linux server, but helps a ton on my Macbook. To do much better than this, we need to avoid so many subprocesses. My guess is that integrating GitPython would be the best approach. Alternatively, we could make it possible to issue multiple git subprocesses at once and overlap the subprocess cost. I think the latter is kind of hard but probably doable.
It appears that using the lru_cache utility erases performance gains: See EDIT below
wit status
on different large workspace than the above:
commit | runtime |
---|---|
master | 3.33 s +/- 0.20 |
passbyval | 2.08 s +/- 0.24 |
manual caching | 1.77 s +/- 0.16 |
use lru | 2.37 s +/- 0.23 |
fix lru | 1.72 s +/- 0.17 |
commit | runtime |
---|---|
master | 6.71 s +/- 0.17 |
passbyval | 5.88 s +/- 0.12 |
manual caching | 4.78 s +/- 0.1 |
use lru | 6.03 s +/- 0.20 |
fix lru | 4.80 s +/- 0.06 |
EDIT: So I didn't know what I was doing, but with @richardxia's help, the lru caching actually caches now and is the same speed as manual caching
I noticed some low-hanging fruit performance issues, this PR fixes 2 of them:
passbyval
was eating a ton of time, the code shows that only shallow clones are necessaryThere is a small risk of bugs here, I may have misunderstood the uses of
passbyval
and maybe my caching strategy is incorrect. In fact, I got the caching wrong the first time which is why theGitRepos
now record "known commits" and will only cache things keyed by known commits.In any case, this is a huge improvement in performance to common operations like
wit status
, benchmarking results to follow