jedbrown / git-fat

Simple way to handle fat files without committing them to git, supports synchronization using rsync
BSD 2-Clause "Simplified" License
621 stars 137 forks source link

1 GB+ Files make shell unresponsive for several seconds #5

Open scopatz opened 11 years ago

scopatz commented 11 years ago

I don't know where else to put this, so I am just reporting this as an issue here. When git fat is used with files that are larger, the sha1 operation seems to dominate the CPU, causing the shell -- and possibly the system -- to become unresponsive for a while (10 sec - 1 min). A similar issue is noticed at the end of a git fat pull of files of this size. Perhaps changing the BLOCKSIZE would alleviate some of this?

jedbrown commented 11 years ago

Simply running git add in a normal repository (or git hash-object) will take a similar amount of time because the SHA1 needs to be computed and the file compressed. Normally this cost only needs to be paid once (when the file is first added to the repo). If you are touching the file, it will need to run again, but will compute the same SHA1. In general, there isn't a way to get semantically correct behavior without reading the whole file, but an inotify daemon could be used to proactively apply and cache the clean filter to files that have been modified.

I can imagine a few circumstances where we might be able to optimize by really doing less. Can you explain the use case where this affects usability?

scopatz commented 11 years ago

I don't disagree that the whole file need to be read in. It is not what git-fat is doing that affects usability, it is the sequence it is doing it in relative to git that causes some awkwardness. Here is what happens to me. Initially I see the following for a couple of seconds.

$ git commit -am "made changes to data"

Then for a a couple of more seconds:

$ git commit -am "made changes to data"
[master

Then,

$ git commit -am "made changes to data"
[master 0516b84]

Then

$ git commit -am "made changes to data"
[master 0516b84] made changes to data

Then

$ git commit -am "made changes to data"
[master 0516b84] made changes to data
 1 files changed, 1 insertions(+), 1 deletions(-)

And then it returns a couple moments later. It seems to me the correct behavior is to print complete lines. I am going to try alias git-fat="nice git-fat" to see if this helps at all.

jedbrown commented 11 years ago
  1. There's nothing I can do about it in git-fat. After all, I'm in a separate process. It is possible that something could be done in upstream git, but it could also be an OS issue. Can you reproduce on Linux, for example?
  2. That alias will do nothing because git does not use your shell when invoking git-fat.