heroku / heroku-geo-buildpack

0 stars 2 forks source link

Huge slug size #6

Closed j4mie closed 4 years ago

j4mie commented 4 years ago

Following this announcement we switched a Django app over from using the old way to using this buildpack.

Before, the slug size was 200.2MB. After, it was 312.2MB, which exceeds Heroku's soft slug size limit.

Looking on a one-off dyno:

$ du -hs .heroku-geo-buildpack/
714M    .heroku-geo-buildpack/

This seems pretty huge! Is there anything that can be done to improve this?

Thanks :)

codingjoe commented 4 years ago

I noticed the same thing. This was fixed by @syphar in the Python buildpack a long time ago, see https://github.com/heroku/heroku-buildpack-python/commit/c44ab4cd03e3d14aa4d775ff6621ea806120362a

The current build includes static binaries, which are not needed by Python. IMHO they should be dropped again, as they were in the Python buildpack to allow people to transition smoothly.

cc @KevinBrolly

filwaitman commented 4 years ago

Same here. du inside the dynos:

$ heroku run "du -hs .[^.]* * | sort -hr | head -n 10" -a XXXX

714M    .heroku-geo-buildpack
437M    .heroku
10M         < sum of all my project stuff >
codingjoe commented 4 years ago

I submitted a patch.

filwaitman commented 4 years ago

Hey @codingjoe! Just so you know, I tried your PR and it did not work (meaning: same slug size after deploy and compression).

My context:

$ heroku buildpacks -a XXX
=== spotfund-dev Buildpack URLs
1. https://github.com/codingjoe/heroku-geo-buildpack.git
2. https://github.com/heroku/heroku-buildpack-apt.git
3. https://github.com/filwaitman/heroku-buildpack-wkhtmltopdf.git
4. https://github.com/heroku/heroku-buildpack-python.git

Size using heroku/heroku-geo-buildpack: 298MB Size using codingjoe/heroku-geo-buildpack: 298MB

I purged the cache but still no luck.

Do you know what I am missing? Did you try it on your end and got a smaller slug size?

codingjoe commented 4 years ago

Hey @codingjoe! Just so you know, I tried your PR and it did not work (meaning: same slug size after deploy and compression).

My context:

$ heroku buildpacks -a XXX
=== spotfund-dev Buildpack URLs
1. https://github.com/codingjoe/heroku-geo-buildpack.git
2. https://github.com/heroku/heroku-buildpack-apt.git
3. https://github.com/filwaitman/heroku-buildpack-wkhtmltopdf.git
4. https://github.com/heroku/heroku-buildpack-python.git

Size using heroku/heroku-geo-buildpack: 298MB Size using codingjoe/heroku-geo-buildpack: 298MB

I purged the cache but still no luck.

Do you know what I am missing? Did you try it on your end and got a smaller slug size?

Hi @filwaitman that is expected, since the PR needs to be merged and the new binaries uploaded to Heroku's internal S3 bucket. You see, the binaries are not compiled during build but in advance. Otherwise you'd end up with a very very long build time. So, thumb up that issue and let's hope the Heroku resident in charge of this buildpack releases the changes soon. Best, Joe

filwaitman commented 4 years ago

Ahhh ok, that makes total sense. Thank you sir!

KevinBrolly commented 4 years ago

I merged the PR + compiled and uploaded the new binaries. You may need to purge your cache once again to get them. Thanks @codingjoe @j4mie and @filwaitman

codingjoe commented 4 years ago

Well it's better now, but there are still a lot of build files, that do not need to be there. I'll try to make some time to clean those up.

filwaitman commented 4 years ago

@KevinBrolly thanks for that!

I tried to purge cache and redeploy... and got the same slug size after all. Any clues about what is happening?

codingjoe commented 4 years ago

I browsed a bit though the dyno files and there is still stuff there that isn't needed. I will try to look into this, but I wanted to add some more tests first, to make sure not to break anything.

codingjoe commented 4 years ago
root@683dec410d22:/tmp/tmp.1AxmfFUUvh# du -hd1  
110M    ./include
7.3M    ./share
2.4M    ./bin
140M    ./lib
359M    .

So I spend some time checking, there are still tons files that are not needed for Python. Actually Python should be fine with only the lib directory.

@KevinBrolly do we need the other binaries for anything else? I am fairly certain we can drop includes since it only contains boost with is only needed to compile the static Python bindings. It's not needed during runtime.

To be fair though, the includes are highly compressible. The static bindings are of course not. Dropping everything, but the bindings will only share ~10 MB of the slug size in the case of GDAL. Still, this could make a sizable difference in dyno startup times.

KevinBrolly commented 4 years ago

@codingjoe With the great changes from #8 and #11 we have gone from:

Before, the slug size was 200.2MB. After, it was 312.2MB, which exceeds Heroku's soft slug size limit.

Looking on a one-off dyno:

$ du -hs .heroku-geo-buildpack/
714M    .heroku-geo-buildpack/

Down to:

$ du -hs .heroku-geo-buildpack/
141M    .heroku-geo-buildpack/

The bulk of the size is the includes but as you noted these are highly compressible and the buildpack itself only adds around 18 MB to the slug size:

$ du -hs slug.tgz
18M slug.tgz

I think shrinking it any more would make a pretty negligible difference to startup times. But if we can easily take more size away I am all for it. We could drop includes with the only downside I can think of being that it is not available for subsequent buildpacks. We do currently add them to C_INCLUDE_PATH here so this could cause problems if someone else relies on this.