Open slimsag opened 5 years ago
images are rather small. Right now our .git is 187M. Saying that large assets can do a lot of damage, but png/jpg are normally fine.
Here are our biggest objects in git:
8717b79ae0e7 799KiB vendor/golang.org/x/text/encoding/japanese/tables.go
d909e38e5e05 801KiB vendor/golang.org/x/text/encoding/traditionalchinese/tables.go
415f52a11160 858KiB vendor/golang.org/x/text/encoding/simplifiedchinese/tables.go
7b218e660082 871KiB doc/user/code_intelligence/img/CodeReview.gif
cfbaa1529c19 1.1MiB cmd/frontend/internal/app/assets/img/Homepage/homepage-mobile.svg
d2375cdaa857 1.5MiB vendor/github.com/ericchiang/k8s/apis/core/v1/generated.pb.go
86e02b492d53 4.4MiB doc/integration/img/BrowserExtension.gif
fa3194e86b69 4.9MiB doc/integration/img/SearchShortcut.gif
3f381f42e428 42MiB cmd/frontend/internal/app/assets/distassets_vfsdata.go
Looks like someone once accidentally committed the output of go-bindata to master and pushed.
The fix here (I think) is to add a script that checks for large assets and fails CI if any are found.
I do not think the problem is "large assets" in specific, but rather "large assets that change often".
The Go files (excluding generated vfsdata ones) are not going to change often, I would expect maybe once a year, maybe less.
But I would not be surprised if we update these gifs/pngs every few months, and we're going to pay the cost of storing that diff forever.
Cool. Please propose a solution. Like the following?
BTW most(/all?) of the vfsdata files actually should not be committed. I have a PR that removes a big one at https://github.com/sourcegraph/sourcegraph/pull/2237/commits/0c46930bcfaed4d56dd16c6aa49222dbb83e6225.
Yes, that is exactly my proposal (I should've reiterated that, my apologies)
FYI we'd like to start adding more images into the frontend (e.g., https://app.zeplin.io/project/5bfee4b96dc22f705c06d0a7/screen/5c5b5202d9f763354701d84b).
I agree large assets are a problem. But things that are at most a few megs (some images) surely isn't that bad given the convenience of tracking it in source control? As it stands, they can't be too big or it will result in a poor user experience for loading a page. So is this issue about large assets (ie we set some upper limit that requires whitelisting to allow merging), or just binary assets in general?
But things that are at most a few megs (some images) surely isn't that bad given the convenience of tracking it in source control?
My worry is our repo ends up being several GB in a few years and someone says "you know what? we screwed up back then and should filter all of those our of our git history (and now we can't really git bisect, oh well)."
Avoiding this by using a storage bucket online seems like a quick and easy way to avoid this potential outcome in the next few years.
I am fine with any option which prevents large binary files that change often (usually our image assets):
Just from some back-of-the-napkin math:
To chime in: I try to rely on SVGs for images so I can add “images”as dom elements but I’m some cases that’s not possible. The onboarding image is about 60KB. Not nothing, but something that might be beneficial to store in git as it doesn’t change.
Marketing images are a different story. As we move some of the marketing and docs pages into sourcegraph it might be a good idea to just link to those from about.sourcegraph.com for the short term.
Anyone tried https://git-lfs.github.com/ before?
I have, it's good when you have a substantial number of large assets to track. I am unsure if it would be better than a simple script that downloads assets from gcs in our case. It does require installing the git extension and running a Git LFS server somewhere, which is a large burden for small cases like ours I think.
On Fri, Feb 22, 2019, 2:02 AM Tomás Senart notifications@github.com wrote:
Anyone tried https://git-lfs.github.com/ before?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sourcegraph/sourcegraph/issues/1952#issuecomment-466325056, or mute the thread https://github.com/notifications/unsubscribe-auth/ADBrOINfYIKJ77IODDnVBcBfS4cp20VLks5vP7IggaJpZM4aHQsr .
and running a Git LFS server somewhere, which is a large burden for small cases like ours I think.
I thought that Github would provide this for us.
@ryan-blunden You committed a lot of large GIFs in acf095236377ece4c0092847ce891c3f5dace56a and other commits to doc/integration/img
. We want to avoid doing this for frequently changing large binary assets because it will make our repository very large.
Images under 100 KB are OK per the doc guide (https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/doc/dev/documentation/style_guide.md#images). Also see that link for instructions on how to upload large assets to Google Cloud Storage to avoid committing them.
I've removed those gifs in #3076. Should we rewrite git history so they are removed from the repo altogether?
@slimsag Given @sqs' doc styleguide addition for large images, can this now be closed?
No.
It's awesome that we are now enforcing it culturally :) But we need something to enforce it like a git push hook in the long haul
(also, yeah, rewriting git history is bad so no on that)
I've just been pointed to this ticket here, because I've added images to the docs in https://github.com/sourcegraph/sourcegraph/pull/9961.
Since the link to "doc styleguide addition for large images" doesn't work anymore and I cannot find anything about images in our current style guide I did the following:
docs/images/
folder in our sourcegraph-assets
bucket.Before that I've also ran the images through ImageOptim.app.
Now, here are a few thoughts on this topic:
*.jpg
, *.png
, *.gif
, *.svg
, without *.svg
it's 28) in our doc
folder. That's not much. But I do think that we should probably add more in the future, to improve our docs.find doc -iname "*.png" -or -iname ".jpeg" -or -iname "*.gif" -or -iname "*.svg" | xargs du -ch
).<major version>.<minor version>.<patch version>
to the path so we don't run into cache problems.I do get why we want to avoid committing binary files, but considering points 1-3 I don't think we're even close to "lots of large, often-changing binary files" territory and when looking at the "free benefits" we get from committing the assets (points 4-5) I'm not sure the tradeoff is worht it (yet).
- Together they take up 7.3MB (
find doc -iname "*.png" -or -iname ".jpeg" -or -iname "*.gif" -or -iname "*.svg" | xargs du -ch
).
This is acceptable. The issue comes in when people start committing large files often. We don't really have anything to prevent that other than people knowing they shouldn't do it. This won't scale as our organisation grows.
I agree though, it is far more convenient to have everything together. But maybe there is some very simple tooling that exists so we can use externally managed assets easily? If it is very low friction that seems reasonable.
Can’t we use Github’s LFS feature for this?
Sent via Superhuman iOS ( https://sprh.mn/?vip=tomas@sourcegraph.com )
On Mon, Apr 20 2020 at 10:04 AM, Keegan Carruthers-Smith < notifications@github.com > wrote:
- Together they take up 7.3MB (
find doc -iname "*.png" -or -iname ".jpeg" -or -iname "*.gif" -or -iname "*.svg" | xargs du -ch
).This is acceptable. The issue comes in when people start committing large files often. We don't really have anything to prevent that other than people knowing they shouldn't do it. This won't scale as our organisation grows.
I agree though, it is far more convenient to have everything together. But maybe there is some very simple tooling that exists so we can use externally managed assets easily? If it is very low friction that seems reasonable.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub ( https://github.com/sourcegraph/sourcegraph/issues/1952#issuecomment-616381140 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAAQPD5FNOZPNHFVGYHILQTRNP6ZBANCNFSM4GQ5BMVQ ).
5. The other benefit is that by committing them it's super easy to package the correct images with the correct version of the docs. With a public bucket we need to do versioning ourselves, probably by adding a
<major version>.<minor version>.<patch version>
to the path so we don't run into cache problems.
Totally agree with this point, and also vote for Git LFS, we could start using GitHub's LFS infrastructure and it's not a lock-in as LFS backend can be arbitrary HTTP server that speaks LFS protocol. Migration also possible as one-off effort AFAIK, without changing the Git history, though never actually did (but I recently implemented a LFS server).
The one little drawback I can think of is, our docsite needs to be updated to work with cloning Git repository, downloading ZIP archive will no longer work (how it works today).
We are committing images to this repository:
https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+file:png
We will one day regret this (maybe not tomorrow, but in a few years) and have to rewrite our git history.
A better and still simple option would be hosting these all somewhere else, like a GCS bucket