thatcher / openseadragon

This project has moved to its new github organization at github.com/openseadragon, please join us!
http://openseadragon,github.com/
37 stars 14 forks source link

huge repository #11

Closed jcupitt closed 11 years ago

jcupitt commented 12 years ago

Hi, thanks for working on this interesting thing.

I just cloned openseadragon and it took an age, here are 450mb of files in the .git history, it looks like the highsmith example from last December is still in there.

Perhaps the commit and removal of this example could be squashed together to remove the extra 450mb download?

DannyJoris commented 12 years ago

+1 What repository should we use? Master? Is there a lightweight version?

iangilman commented 11 years ago

@jcupitt Apologies for the delayed response... I'm new to the project, helping out with stuff like this.

I'd love to kill that extra baggage, but I don't actually know how to do that with git/GitHub... can you point me in the right direction, and I'll take care of it?

dgutman commented 11 years ago

From the command line version , you'd do something like

Use git rm:

git rm file1.txt git commit -m "remove file1.txt"

Basically your just deleting the file in your local version, and then after committing that file is now removed from the master branch (I am assuming you have commit privileges)

On Tue, Jan 15, 2013 at 7:47 PM, iangilman notifications@github.com wrote:

@jcupitt https://github.com/jcupitt Apologies for the delayed response... I'm new to the project, helping out with stuff like this.

I'd love to kill that extra baggage, but I don't actually know how to do that with git/GitHub... can you point me in the right direction, and I'll take care of it?

— Reply to this email directly or view it on GitHubhttps://github.com/thatcher/openseadragon/issues/11#issuecomment-12298379.

David A Gutman, M.D. Ph.D. Assistant Professor of Biomedical Informatics Senior Research Scientist, Center for Comprehensive Informatics Emory University School of Medicine

iangilman commented 11 years ago

@dgutman But won't it still be present in the history and therefore contribute to the size of the repository? We're trying to slim down the repository here.

That said, it looks like the culprit is actually still in the gh-pages branch, which is where http://thatcher.github.com/openseadragon/ is served up from. Perhaps we should look into hosting those images elsewhere (even just another git repo). Then we'll still have the issue of pulling them out of this repo's history, though, to make it slimmer.

bgilbert commented 11 years ago

@dgutman, the files need to be removed from the history as well as from the HEAD.

The files still exist in the gh-pages branch, which is used to serve the website. So you'll first have to figure out how you want to host the files (assuming you do); Amazon S3 may be useful.

Once you no longer need the files, you'll have to rewrite both the gh-pages and master branches in your local repo, then force-push an update to the GitHub repository. This will cause problems for all downstream repos, unfortunately. git-filter-branch is the easiest way to do the rewriting.

iangilman commented 11 years ago

Sounds like maybe we'll have to live with this until we're ready to invalidate all the forks of this project. Perhaps at that time we should break the gh-pages out into its own repo anyway.

bgilbert commented 11 years ago

There will only be more forks over time, never fewer. So I would argue it's better to fix the repo now and get it done with.

As an alternative, you could put the rewritten branches into a new repository and abandon the current one. But then you'll have a renamed repository forever.

iangilman commented 11 years ago

Indeed, good points. I'll chew on this.

When you say the force push will cause problems, what do you mean? They'll have to re-fork?

Ventero commented 11 years ago

They simply won't be able to use git pull (or equivalents) to update their repository, but instead manual intervention (e.g. git reset --hard origin/master and git push -f afterwards) is required.

I agree with @bgilbert: the sooner the rewrite happens, the better. According to the network graph, there's currently only 6 forks on GitHub, of which only one (incidentally mine ;) ) has seen any activity in the last few months, so the number of people affected by this really isn't that large. Admittedly, there might be more people out there who have a local clone of the repository, but as @bgilbert pointed out, there'll only be more of those over time.

You probably should merge the open pull requests before rewriting the repository though, as I'm not entirely sure how pull requests that don't match the repository's history anymore are handled by GitHub.

dgutman commented 11 years ago

Yeah in particular it seems like your going to be doing a lot of the pull request updates that have been lingering for a while , so Id just get it over with

dg

On Thu, Jan 17, 2013 at 4:17 PM, Ventero notifications@github.com wrote:

They simply won't be able to use git pull (or equivalents) to update their repository, but instead manual intervention (e.g. git reset --hard origin/master and git push -f afterwards) is required.

I agree with @bgilbert https://github.com/bgilbert: the sooner the rewrite happens, the better. According to the network graphhttps://github.com/thatcher/openseadragon/network, there's currently only 6 forks on GitHub, of which only one (incidentally mine ;) ) has seen any activity in the last few months, so the number of people affected by this really isn't that large. Admittedly, there might be more people out there who have a local clone of the repository, but as @bgilbert https://github.com/bgilbert pointed out, there'll only be more of those over time.

You probably should merge the open pull requests before rewriting the repository though, as I'm not entirely sure how pull requests that don't match the repository's history anymore are handled by GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/thatcher/openseadragon/issues/11#issuecomment-12392117.

David A Gutman, M.D. Ph.D. Assistant Professor of Biomedical Informatics Senior Research Scientist, Center for Comprehensive Informatics Emory University School of Medicine

iangilman commented 11 years ago

@ventero, cool, that doesn't seem too bad. I'll focus on pull requests first (as I get time for them) and move toward the fix to this bug.

thatcher commented 11 years ago

Im going to try to get the hang of this today. we are offically in repo move mode and this is pretty much on the critical path.

Ian has created 2 repos in the new org which we'll have to copy his build/test work back into my master, delete those, branch my master into both a new openseadragon and site-repo where I'll apply the git-filter-branch on each to remove the appropriate stuff for each.

I'm not so worried about preserving the history of the actual web site so I'm going to copy the static files that serve the website into a new repo there called openseadragon.github.com to get the current site moved over and running in parallel before the rest of the git repo work happens.

thatcher

iangilman commented 11 years ago

This has now been fixed... the new repositories are at http://github.com/openseadragon