rubygems / bundler-features

Bundler feature requests and discussion
28 stars 8 forks source link

Better handling of large repositories when used as a git gem. #1

Closed xaviershay closed 8 years ago

xaviershay commented 11 years ago

Using git for large gems is slow to clone and uses a lot of disk space. Moved from https://github.com/bundler/bundler/issues/228

Approaches discussed include:

elgalu commented 10 years ago

+1 for allowing the option depth: 1

indirect commented 10 years ago

How do you even allow it, though? Your lock will always have a specific ref, and --depth 1 forces you to use the newest commit in the repo. Those are mutually incompatible.

TimMoore commented 10 years ago

In the case of a forked gem, it seems like a pretty common case that the newest commit in the repo is the one you want to use.

I think it might be reasonable to allow a shallow clone option and simply fail if it doesn't match the locked ref. It can print a message saying to either bundle update the git gem or remove the shallow option.

indirect commented 10 years ago

Hmmm I guess that's true. Pretty sure that --depth 1 also only works with the master branch, but I am open to a patch for this. We'll just need to be very careful about documenting that this only works for repos that are not under active development.

elgalu commented 10 years ago

I've done some research and it seems there is a way to download an specific ref zipball / tarball

This won't download git revisions nor history, just the git blobs (files) of some specific ref

# e.g. get Rails 4.0.1 expanded tarball
time curl -L https://github.com/rails/rails/tarball/5505c1d700 | tar zx
# real  0m17.263s
# user  0m0.584s
# sys   0m0.360s

# compare git clone (even though this doesn't solve the current feature)
time git clone https://github.com/rails/rails.git --depth=1
# real  0m23.989s
# user  0m1.524s
# sys   0m0.396s
indirect commented 10 years ago

Uhh… you realize that there are many Bundler users getting git repos from places other than Github, right?

On Nov 25, 2013, at 4:20 PM, Leo Gallucci notifications@github.com wrote:

I've done some research and it seems there is a way to download an specific ref zipball / tarball

This won't download git revisions nor history, just the git blobs (files) of some specific ref

e.g. get Rails 4.0.1 expanded tarball

time curl -L https://github.com/rails/rails/tarball/5505c1d700 | tar zx

real 0m17.263s

user 0m0.584s

sys 0m0.360s

compare git clone (even though this doesn't solve the current feature)

time git clone https://github.com/rails/rails.git --depth=1

real 0m23.989s

user 0m1.524s

sys 0m0.396s

— Reply to this email directly or view it on GitHub.

elgalu commented 10 years ago

Yes!!

But maybe the great majority are Github users, i'm trying to get some scm usage statistics correlated to bundler, will let know if i get the number.

What if >= 80% are bundler+Github users? Would the feature be worth building, i imagine something like this:

source 'https://rubygems.org'

gem 'rails', github: 'rails/rails', ref: '5505c1d', tarball_only: true
indirect commented 10 years ago

I mean, you could build it in as a way to make a shallow option that works only in concert with a github option (or git urls that point at github). Alternately, the shallow option only allows master with no ref specified, I guess.

On Nov 25, 2013, at 4:38 PM, Leo Gallucci notifications@github.com wrote:

Yes!!

But maybe the great majority are Github users, i'm trying to get some scm usage statistics correlated to bundler, will let know if i get the number.

What if >= 80% are bundler+Github users? Would the feature be worth building, i imagine something like this:

source 'https://rubygems.org'

gem 'rails', github: 'rails/rails', ref: '5505c1d', tarball_only: true — Reply to this email directly or view it on GitHub.

TimMoore commented 10 years ago

git clone has a --branch option:

--branch -b

Instead of pointing the newly created HEAD to the branch pointed to by the cloned repository's HEAD, point to branch instead. In a non-bare repository, this is the branch that will be checked out. --branch can also take tags and detaches the HEAD at that commit in the resulting repository.

This actually sounds pretty useful even without shallow clones.

Also:

--[no-]single-branch

Clone only the history leading to the tip of a single branch, either specified by the --branch option or the primary branch remote's HEAD points at. When creating a shallow clone with the --depth option, this is the default, unless --no-single-branch is given to fetch the histories near the tips of all branches. Further fetches into the resulting repository will only update the remote-tracking branch for the branch this option was used for the initial cloning. If the HEAD at the remote did not point at any branch when --single-branch clone was made, no remote-tracking branch is created.

myronmarston commented 10 years ago

As @TimMoore pointed out, --depth can work in conjunction with --branch. The git docs mention that --branch can be given a tag instead of a branch and that works. For example, I tried this and it worked fine:

$ git clone git://github.com/rails/rails --depth 1 --branch v4.1.0.rc1

So...maybe bundler could have an optimization that if a tag has been provided, it'll do the shallow clone?

indirect commented 10 years ago

The tricky part is how to handle that branch or ref changing. What if it's a force push and the remote commits are now gone? How do you do an update? You have to do a completely new shallow clone. That slows down install and update by a huuuuge amount in exchange for saving some disk space, which doesn't seem worth it to me.

myronmarston commented 10 years ago

Well, I suggested doing it only for tags because tags are generally intended to be immutable. I wouldn't do it for branches or other refs. I suppose that since tags can still be force pushed over it might still not be doable, though.

myronmarston commented 10 years ago

Seems like git really needs to support a ref option :(.

indirect commented 10 years ago

Oh yeah, also that. If you can’t clone directly to a ref (which git doesn’t let you do) you’re basically screwed before you even start to try to deal with the problems of changing refs and updating meaning cloning 100% of the repo again. :/

On Mar 12, 2014, at 12:10 AM, Myron Marston notifications@github.com wrote:

Seems like git really needs to support a ref option :(.

— Reply to this email directly or view it on GitHub.

sdhull commented 10 years ago

OK I recently ran into this issue for a project I'm working on, and I'd like to work on a PR for it.

Can we agree that if a user is using the github option, then we can leverage the tarball / zipball download feature of github, and otherwise we fallback to whatever the current behavior is?

indirect commented 10 years ago

Does github offer tarballs for any sha? If so, that might work. With the caveat that bundle update becomes impossible. Which maybe means we can't really do it. The existing git infrastructure expects a git repo, though. :/

This could possibly be similar or the same to #3017, which is also about using a gem from a tarball (in the form of a .gem file).

On Thu, May 8, 2014 at 12:08 PM, Stevenotifications@github.com, wrote:

OK I recently ran into this issue for a project I'm working on, and I'd like to work on a PR for it.

Can we agree that if a user is using the github option, then we can leverage the tarball / zipball download feature of github, and otherwise we fallback to whatever the current behavior is?

— Reply to this email directly or view it on GitHub.

simi commented 10 years ago

@indirect you can use this url https://github.com/bundler/bundler/archive/8b2bb8f4f69e4bd47c9a66e0579e8b58dc7cbe7e.zip

marcferna commented 10 years ago

+1

quentindemetz commented 9 years ago

Is it possible to specify a tarball in the Gemfile? Wouldn't that be a good solution that fixes the --depth conversations as well (at least for Github users) ?

indirect commented 9 years ago

Yes, it is possible to specify a tarball. Gems are tarballs, and the best way to use Bundler is to release gems and use them. :)

nleo commented 8 years ago

It's awful!

I add gem 'spree', github: 'spree/spree' and need wait 5+ minutes to install it!

Any progress?

indirect commented 8 years ago

The best way to avoid the problems of git gems is to build a gem from the repo and use that gem. Git repos are not a good match for single versions of gems.

coilysiren commented 8 years ago

This issue was moved to bundler/bundler#4556