sorin-ionescu / prezto

The configuration framework for Zsh
MIT License
14.01k stars 4.49k forks source link

Speed up git-info #221

Closed paulmillr closed 11 years ago

paulmillr commented 12 years ago

So, git-info is currently about 408 lines long, but many themes don't need all stuff it does.

Actually I don't mind about everything there, but the reason I created this issue is its speed. It's freaking slow.

How about adding light-git-info that will only do: git symbolic-ref HEAD 2> /dev/null (get current branch)?

sorin-ionescu commented 12 years ago

'Freaking slow' doesn't help me make it faster. vcs_info is even larger.

paulmillr commented 12 years ago

Right, but I do not think profiling will help much. It just seems logical. When your machine constantly does IO and you have HDD instead of SSD, doing one IO command will be faster than doing 5 (10?).

also you seem made a typo in this guy's name.

sorin-ionescu commented 12 years ago

I do not have a SSD, and, for me, it's fast enough. You can execute git-info off for very large repositories.

I'm going to invite @ColinHebert into this conversation.

clvv commented 12 years ago

Try it on a removable usb drive. I reckon that would make a difference.

A solution is to use timeout. But of course it can only be used on one command, not the entire function. Or it maybe possible to create a timeout function to set running-time limit on a shell function. Something like this (link to my dotfile repo, implementation doesn't really work on zsh).

sorin-ionescu commented 12 years ago

How does git-info compare to vcs_info? Do you consider that git-info does slightly more than vcs_info in some cases.

benohara commented 12 years ago

Hard to explain, but it takes a second for the prompt to display compared to using vcs_info

see http://codestre.am/7668bbd5a1dcb3607ff738b09

sorin-ionescu commented 12 years ago

I wonder if vcs_info is caching. @benohara git-info is not that complicated. Don't be afraid to look it over.

ColinHebert commented 12 years ago

For @benohara, If I had to guess, I would say that it's due to git submodules, it tends to slow down git status which is the biggest call made in git-info. You can try that theory with zstyle ':omz:module:git:ignore' submodule 'all' it should skip git status on your submodules and go way faster.


Regarding the speed of git-info itself I think it could be slightly improved by checking if the appropriate zstyle is set (zstyle -t context style [ strings ...] does that I believe).

For example, no need to run git symbolic-ref -q HEAD if you don't get the branch name or git rev-parse --symbolic-full-name --verify HEAD@{upstream} if you display nothing when the local branch isn't synchronised with the remote branch.

Those are minor improvements but should be make things slightly faster (I think).


And to answer the initial question, should we have a git-info light; I think that if you need a lighter git-info (ie. with less features) you should consider using vcs_info.

git-info is, as far as I am know, not here to be lighter or faster than vcs_info, but to add more features that you couldn't do in vcs_info without writing a terrible code (because there is not enough hooks for everything) or a slower one.

I have nothing personal against vcs_info and it works very well if you want to keep you shell simple (and you can use the same style for every VCS). The reason why I don't use it is because I work all day long with git, and I need more info than just "is the local repo dirty" and the current branch name; I think I use 90% of what is executed in git-info.

sorin-ionescu commented 12 years ago

I have looked at vcs_info, more specifically VCS_INFO_get_data_git. It does not cache. It does not do anything clever to be faster. It uses git diff-index to get a few things, namely staged, non-staged, and commit in addition to the branch and action.

git-status is slow. There is no way around it as far as I know. If you do not need a lot of information, use a theme that uses vcs_info. If you want a lot of information, use a theme that uses git-info.

As @ColinHebert said, any changes that can be done to git-info are trivial and are not likely to provided a perceptible increase in speed.

pbrisbin commented 12 years ago

For what it's worth, I have a git-info-fast in my branch. It doesn't do some of the remote lookups and runs much faster (measured subjectively of course.) than the existing git-info.

I've been meaning to do a proper pull request (for this and other things) and now that the repo's been split, hopefully that can happen soon.

If anyone's interested, it's sitting here for now.

ColinHebert commented 12 years ago

What does git-info-fast do that couldn't be available (easily) with vcs_info?

I mean, as said in this discussion the difference between git-info and vcs_info is that one provides more information while the other has the compatibility system with every (or most of them) VCS. It seems to me (correct me if I'm wrong) that git-info-fast provides the same amount of details as vcs_info while not being compatible with other VCS systems like git-info.

Not that what you did isn't efficient or useful, but how does this solution compare to the two existing solutions already used by OMZ users?

sorin-ionescu commented 12 years ago

@pbrisbin I am not merging that. Other than what @ColinHebert said, it's broken, especially the way it checks if you are inside of a repository.

I am open to making git-info faster without removing functionality, perhaps by using different low level git executables, such as git-diff-index.

sorin-ionescu commented 12 years ago

Is there a Git daemon that uses inotify (Linux), FSEvents (Mac OS X), kqueue (Mac OS X, BSD), ReadDirectoryChangesW (Windows) to always be up to date on work tree changes in order for git status to run instantly by not having to walk said tree?

Should we cache git status then use a directory change notification library to update the changed file counts for added, modified, removed, renamed, and so on?

paulmillr commented 12 years ago

@sorin-ionescu awesome idea, :+1:

sorin-ionescu commented 12 years ago

@paulmillr Someone else has had the same idea: inotify daemon speedup for git. Unfortunately, it was not successful.

sorin-ionescu commented 12 years ago

So, who wants to extend kqwait to attempt caching + file system notifications? You cannot expect me to everything?

ColinHebert commented 12 years ago

I'm not really fond of having that inside Prezto. If anything was done I would prefer to see an extension of git itself speeding up the git status, I'm not sure I'm comfortable with having my shell spawning daemons in every git repo I own and caching things weirdly.

I would be all for a new project to replace git status or enhance it. (As I'm trying to play with ruby on my weekends I might try that actually, but it's for fun, don't expect anything)

sorin-ionescu commented 12 years ago

You can't do it in Ruby. It's low level kernel stuff. You'll have to do it in C.

pbrisbin commented 12 years ago

You could write 95% of it in Ruby on top of 5% of C bindings. Some already exist for inotify and libgit2.

Personally, I think there's a whole in the market for a "git-prompt" which efficiently gives prompt-friendly (and formattable) status output.

Daemon-watcher-caching sounds useful, but a secondary concern to me. My git prompt's plenty fast so long as I'm on an SSD (which will soon be the norm) and I take out the calls that needed network.

I also agree -- as the requirements for this grow, you've moved well out of shellrc territory.

On Tue, Oct 2, 2012 at 4:46 PM, Sorin Ionescu notifications@github.comwrote:

You can't do it in Ruby. It's low level kernel stuff. You'll have to do it in C.

— Reply to this email directly or view it on GitHubhttps://github.com/sorin-ionescu/prezto/issues/221#issuecomment-9086281.

sorin-ionescu commented 12 years ago

So, I've been toying with trying to make git-info faster. I've done a lot of changes on this issue's branch.

Besides the boat load of if statements to test if a zstyle has been defined, it now also lets you choose between classic git-info status (full) and vcs_info status (partial), which only shows indexed (staged), via format code %i, and unindexed (unstaged), via format code %I.

zstyle ':prezto:module:git:info' status 'partial'
zstyle ':prezto:module:git:info:branch' format ':%F{green}%b%f'
zstyle ':prezto:module:git:info:indexed' format ' %B%F{green}i%f%b'
zstyle ':prezto:module:git:info:unindexed' format ' %B%F{blue}I%f%b'
zstyle ':prezto:module:git:info:keys' format \
  'prompt' ' %F{blue}git%b' \
  'rprompt' '%i%I'

Please test this new git-info for speed and bugs.

# Switch to git-info theme.
time (git-info)

# Switch to vcs_info theme.
time (vcs_info)
ColinHebert commented 12 years ago

@sorin-ionescu I don't intend to do any low level stuff there is already plenty of tools to use inotify and FSEvent. Worse case scenario I would have to do some ruby ffi (I would very much like to avoid that anyway).

Plus it would be easier to move to C if a POC can be setup quickly, there IMHO is only perl, python and ruby as viable languages for this POC, there is no way I do that in perl, so I'll try with ruby.

@pbrisbin I think it will still be useful when you work with a lot of submodules (which is my case, about 100 submodules in my main project at work)

@sorin-ionescu heh, applying ifs to check if the zstyle is used rings a bell. But anyway, I think our main problem is (and will stay for a while) this git status which is incredibly slow (at least that's what bothers me the most).

sorin-ionescu commented 12 years ago

@ColinHebert Well, with %i and %I, you can now have vcs_info status, including its deficiency of not detecting untracked files. The new boat load of if statements, we should probably keep. The vcs_info style status, I'm not too sure.

Benchmarking it against vcs_info themes would be useful.

sorin-ionescu commented 12 years ago

The new git-info is slightly faster.

Old:
0.04s user 0.09s system 85% cpu 0.153 total

New (status enabled):
0.04s user 0.08s system 87% cpu 0.138 total

New (status not enabled):
0.02s user 0.05s system 87% cpu 0.085 total
sorin-ionescu commented 12 years ago

I've toyed with a peepcode theme clone called peepcode_git_info that uses git-info.

peepcode (vcs_info):
0.04s user 0.07s system 87% cpu 0.124 total

peepcode_git_info (git-info):
0.03s user 0.06s system 86% cpu 0.104 total

It's probably faster because git-info does not have stgit support.

The git-info version is a lot more readable than the vcs_info version.

Comments?

sorin-ionescu commented 12 years ago

@ColinHebert How does multiple calls to git ls-files compare to one call to git status --porcelain, I wonder?

ColinHebert commented 12 years ago

Hum, I'm not so sure about ls-files it's really recommended to stay away from it (for scripting). If we want to go with plumbing commands, we should take a look at git diff-index and git diff-files.

I did a really quick test, here is what we would like to have:

added (to the WD/untracked) :

git ls-files -o --exclude-standard

added (to the index):

git diff-index HEAD --name-status --cached (--find-renames)

removed (from the WD):

git diff-files --name-status

removed (from the index):

git diff-index HEAD --name-status --cached (--find-renames)

modified (in the WD):

git diff-files --name-status

modified (in the index):

git diff-index HEAD --name-status --cached (--find-renames)

renamed (in the WD): NOT RELEVANT

renamed (in the index):

git diff-index HEAD --name-status --cached --find-renames

I haven't checked the unmerged yet. And there is a big problem with all of that, almost all of those commands require HEAD which doesn't exist until the initial commit is done.


Overall I think that we should stick with git status which already does the aggregation we're about to do. I'm not sure that doing that ourselves will give better results.

sorin-ionescu commented 11 years ago

Has anybody bothered to test these changes for speed and bugginess?

sorin-ionescu commented 11 years ago

I am inviting @skpw into this conversation.

sorin-ionescu commented 11 years ago

I have made git-info faster by only computing information when a particular zstyle is defined. However, since git-status is slow and many do not want as much repository information as my theme shows, I have also added a mode, simple, in lieu of complex, feel free to suggest better names, that behaves similarly to vcs_info, which informs of staged and unstaged files, which for the purpose of git-info, they shall be known as indexed files and unindexed files, the %S format code is in use for stashed files.

Select the mode you want for your theme:

zstyle ':prezto:module:git:info' status 'simple/complex'

I have come up with two versions of the simple mode, known as v1 and v2, which I shall discuss next.


v1 behaves similarly to vcs_info, but unlike vcs_info, unindexed also informs of untracked files because I have noticed that many vcs_info themes hack support for untracked files using a vcs_info hook since most people consider both unindexed and untracked as one and the same — not in the index. See the peepcode theme for an example. They can be separated, of coarse; I just chose to follow the hook hack.

The performance between vcs_info and git-info is virtually identical provided that the vcs_info theme also checks for untracked files.

The following format codes are available.

Name Format Code Description
indexed %i Indexed files indicator
unindexed %I Unindexed (including untracked) files indicator

The deficiency of this version of the simple mode is that these format codes have to be set to a coloured UTF-8 character or word. There is no count of indexed and unindexed files like in other contexts.


v2 behaves similarly to the classic git-info and calls the same git porcelain commands as v1 but presents the information computed differently. unindexed no longer mashes together unindexed files and untracked files; they are now split into separate unindexed and untracked contexts. Furthermore, the file count for each context is provided.

This version also transplants two contexts from the complex mode, clean and dirty. Many people just want to know when a repository is dirty by displaying the character.

So, what is dirty?

 dirty = indexed + unindexed + untracked

The above three contexts are initialised to 0 and unless defined in the theme, they are never computed. If dirty to you means unindexed and untracked but not indexed, and you want to show the character you'll have to define the following:

zstyle ':prezto:module:git:info:unindexed' format ' '
zstyle ':prezto:module:git:info:untracked' format ' '
zstyle ':prezto:module:git:info:dirty' format ' %F{red}✗%f'

The following format codes are available.

Name Format Code Description
clean %C Clean state
dirty %D Dirty files count
indexed %i Indexed files count
unindexed %I Unindexed files count
untracked %u Untracked files count

v2 is slightly slower than v1 because for indexed and unindexed, we can no longer rely on exit codes and have to count files.

Using time (vcs_info) and time (git-info), I have got the following numbers in a repository with 1 indexed file, 3 unindexed files, and 1 untracked file.


Please vote for or against v1 or v2. You can also suggest your own or none at all. I'm not particularly fond of adding more features to git-info.

paulmillr commented 11 years ago

:+1: v2

swsnr commented 11 years ago

:+1: v2

sorin-ionescu commented 11 years ago

Perhaps minimal and verbose are better names for the two modes than simple and complex.

sorin-ionescu commented 11 years ago

If anybody has got ideas on how to speed it up further, I'm listening. Yes, you'll have to read and comprehend the giant git-info function.

sorin-ionescu commented 11 years ago

If all you want to show is a dirty repository indicator, no counts, vcs_info is still your best bet.