Closed asteele0 closed 1 year ago
Hi @asteele0. Thanks for the detailed notes. Yes one downside of Git-Sim relying on Manim as a dependency for rendering the output images and videos is that its performance is rather unoptimized. This can result in longer render times for Git-Sim on repos that have large numbers of commits to traverse.
However, A LOT depends on the specific commit structure being traversed by a particular command. The way Git-Sim commit parsing words is it starts at the HEAD commit and works backwards recursively tracing parent commits and rendering them, until N commits have been parsed PER BRANCH. The default N is 5, but as you saw you can manually set it using the -n 2
flag, for example to only show 2 commits per branch.
That being said, showing only 2 commits should NOT take 24 mins... despite Manim's render time. It's possible there is a design flaw in my commit parsing algorithm that's causing this in large repos.
Any chance you can provide the screenshot Manim is outputting for the git-sim log -n 2
command? Also - any chance this is an open-source repo I could take a look at trying to reproduce? Or if not, can you try reproducing this in a large open-source repo so that I can test and play around with it?
@initialcommit-io sorry for the delay.
Yeah, here's the image generated:
Unfortunately I didn't copy it, but the execution time was much shorter this time. More like 12m.
This is probably just because of my computer, though.
I did notice that it generated a lot more files than I expected:
C:\source\git-sim_media\source\images
C:\source\git-sim_media\source\texts
C:\source\git-sim_media\source\videos
C:\source\git-sim_media\source\images\git-sim-log_06-05-23_14-10-43.jpg
C:\source\git-sim_media\source\texts\17dd8cd4858722ca.svg
C:\source\git-sim_media\source\texts\1bfa1e6971f55cab.svg
C:\source\git-sim_media\source\texts\4947dc0b08323edd.svg
C:\source\git-sim_media\source\texts\52d1ff1694365c2a.svg
C:\source\git-sim_media\source\texts\5c02b1dbfc9212fb.svg
C:\source\git-sim_media\source\texts\642cbb1c7323e76e.svg
C:\source\git-sim_media\source\texts\760beb94de3b39af.svg
C:\source\git-sim_media\source\texts\96f9676f36dcf0cf.svg
C:\source\git-sim_media\source\videos\1080p60
C:\source\git-sim_media\source\videos\1080p60\partial_movie_files
C:\source\git-sim_media\source\videos\1080p60\git-sim-log_06-05-23_13-41-04.mp4
C:\source\git-sim_media\source\videos\1080p60\partial_movie_files\Log
C:\source\git-sim_media\source\videos\1080p60\partial_movie_files\Log\1019697990_51972775_2762394053.mp4
C:\source\git-sim_media\source\videos\1080p60\partial_movie_files\Log\partial_movie_file_list.txt
Unfortunately the repo is not open source, but I did run the same command on the vscode repo. The execution time is comparable to my repo, so I think it's a good testing area.
git-sim log -n 2
: 11m 57s 728ms
Processing blobs: 353412 Processing trees: 958462 Processing commits: 115283 Matching commits to trees: 115283 Processing annotated tags: 50 Processing references: 1011 | Name | Value | Level of concern |
---|---|---|---|
History structure | |||
* Maximum tag depth [1] | 2 | * | |
Biggest checkouts | |||
* Number of directories [2] | 6.81 k | *** | |
* Maximum path depth [3] | 12 | * | |
* Maximum path length [4] | 127 B | * |
So you're saying that git-sim recursively parses n
commits per branch, no matter what branch is checked out?
To me, and of course I don't know how any of it works, it seems that is doing entirely too much unnecessary work.
Naively, it seems that offloading the heavy tree parsing to git log
(or some plumbing command, idk), and then just rendering the output would be the fastest solution?
Thanks for looking into this!
@asteele0 Thank again for your thoroughness and clarity! I did some testing and found out the performance issue is a bug related to checking for remote tracking branch names to use as labels where appropriate. It was inefficiently repeating this code multiple times within loops which was taking forever on repos with many remote tracking branches, (i.e. the big repos you've been working on!)
I fixed it so that this remote-tracking-branch code runs only once, and pushed new version git-sim 0.3.1 (feel free to see here for the exact code change I made). You can test it out by running pip install git-sim --upgrade
. It would be awesome if you can give it a try and make sure it works. Should be much faster now, but I'll keep this issue open until you confirm. I hope this makes git-sim usable for your team and use-case!
In case you're curious, here are some answers to some of the points you brought up:
So you're saying that git-sim recursively parses n commits per branch, no matter what branch is checked out?
No - git-sim starts at the HEAD commit and parses backwards recursively until n
commits are reached. However, if along the path a commit with multiple parents is found, the recursive nature will keep parsing until up to n
commits are drawn along each commit chain, by design. So n
doesn't necessarily refer to the total number of drawn commits in the whole diagram, but the number of commits drawn per split chain, if that makes sense.
it seems that offloading the heavy tree parsing to git log...
This is basically the way git-sim works. In general the parsing doesn't eat up performance, it's the Manim rendering (especially when using the --animate
flag for video) that comes with heavy performance load.
Nice! That totally fixed it. I'll mark the ticket as closed, thank you!
git-sim log -n 2
: 7s 435ms
It only took 11s
to run git-sim log
:
[...]keep parsing until up to n commits are drawn along each commit chain
Ah, yeah that makes sense. I misunderstood what you were saying originally, sorry! 😅
This is basically the way git-sim works.
Awesome! Thanks for taking the time to explain, I appreciate it.
@asteele0 Sweet!! Glad to hear that. It still takes longer than I'd like (esp on average processors), but unfortunately it's a bit out of my control due to the dependency on Manim at this time. I hope it's manageable though, and def shouldn't be taking MINUTES for rendering just a few commits (let me know if you run into anything else with execution time that bad...)
Where it does still take a prohibitively long time is if you use a custom higher value n
like 15, combined with a heavily-merged repo due to rendering all the commit paths, and ESPECIALLY if you render video using the --animate
flag. But for example I just tested git-sim log -n 15
on the vscode repo on my Desktop (13600k) and it took 31.65 seconds, which again isn't amazing but isn't horrible...
Anyway, if you end up finding good use cases for git-sim on your team I'd love to hear about them at some point!
Hi @initialcommit-io. I can open a new issue if you like, but the fix for this seems to have broken visualizations of merge operations:
$ git-sim merge mp34_popular_projects
Manim Community v0.17.3
Simulating: git merge mp34_popular_projects
...
│ /Users/eric/projects/mostly_python/.venv/lib/python3.11/site-packages/git_sim/merge.py:68 in │
│ construct │
│ │
│ 65 │ │ for k, v in self.__dict__.items(): │
│ 66 │ │ │ print(k, v) │
│ 67 │ │ │
│ ❱ 68 │ │ if not self.is_remote_tracking_branch(self.branch): │
│ 69 │ │ │ if self.branch in self.repo.git.branch("--contains", head_commit.hexsha): │
│ 70 │ │ │ │ self.ff = True │
│ 71 │ │ else:
...
AttributeError: 'Merge' object has no attribute 'is_remote_tracking_branch'
My line numbers might not match yours because I had written some diagnostics into the merge.py file. Installing 0.3.0 fixes my issue.
Interestingly, this came up when drafting a newsletter post about newer Python projects that might be appropriate for people to start contributing to. I would be happy to consider helping set up some end to end tests if that would be helpful.
@ehmatthes Oh, shoot. Yes you're absolutely right and thx for letting me know! I removed a method as a part of the fix and thought it wasn't used anywhere else... I will add this back in tonight and roll a new release with the fix.
And LOL that's a very suitable time to run into this error! Thanks for considering git-sim in your newsletter - I'd love to read it when you send it out.
And yes! It would be awesome to get some help setting up some end to end tests for Git-Sim! I've had this on my mind for months but haven't had the time. We have an existing issue for this (https://github.com/initialcommit-com/git-sim/issues/55) but work has stalled. We can definitely pick it up there.
However, one thing I did do was create a bundled dependency called Git-Dummy (https://github.com/initialcommit-com/git-dummy) which should simplify the process of creating dummy repos in a desired state to run the end-to-end tests on and confirm an expected result...
@ehmatthes Just pushed new version 0.3.2 with the fix for this. You can install it with pip install git-sim --upgrade
.
Please test it out and let me know if the merge subcommand works now!
Yes, 0.3.2 works for me. I'm going to jump into that other thread about testing. Thanks!
I work on a large repo, and I wanted to see if this tool could help my team. It seems like it would, but it is prohibitively slow to use. Is this level of performance expected, or have I maybe set something up incorrectly?
Large repo
git-sim log -n 2
: 24m 5s 7msgit-sizer output
Processing blobs: 121739 Processing trees: 149846 Processing commits: 17891 Matching commits to trees: 17891 Processing annotated tags: 1491 Processing references: 12914
Medium repo
I ran it on a medium-ish size repo as another reference point.
git-sim log
: 1m 17s 949msgit-sim log -n 2
: 24s 134msThese times are more acceptable, although still slower than I'd like.
git-sizer output
Processing blobs: 5497 Processing trees: 3449 Processing commits: 977 Matching commits to trees: 977 Processing annotated tags: 0 Processing references: 4399
Dummy repo
As a control I ran it on the dummy repo generated from
git-sim log
: 5s 305msThen for funsies I created a few more dummy repos, running log on them:
git-dummy --name="dummy-repo" --branches=5 --commits=100
: 5s 331msgit-dummy --name="dummy-repo" --branches=50 --commits=100
: 12s 852msgit-dummy --name="dummy-repo" --branches=10 --commits=500
: 5s 388msIt's interesting, it looks like number of commits doesn't terribly affect the results, but the number of branches does.
Environment