heroku / buildpacks-python

Heroku's Cloud Native Buildpack for Python applications.
BSD 3-Clause "New" or "Revised" License
25 stars 2 forks source link

Support for `pre_compile` and `post_compile` steps #14

Open ipmb opened 1 year ago

ipmb commented 1 year ago

Just tested out v0.1.0 and noticed this was missing. It looks like you have some other issues to get it to parity with the legacy buildpack, so just dropping this one in too.

Thanks for your work on this!

edmorley commented 1 year ago

Thank you! Yeah this is on the list of things to consider - though it's possible I'll be dropping support for this in favour of people using the built-in inline buildpack feature instead, since it's not ideal for each language to reimplement custom run support. (For languages where there are built-in hooks, it's perhaps still worth supporting those, but for Python post_compile is proprietary so doesn't seem worth preserving in the CNB.)

I'll leave this open to track that decision, and if the feature is dropped we'll need to add warning (or error) messages to ease the transition regardless.

luzfcb commented 1 year ago

I haven't tried this new way of deployment yet, but I would like to leave a comment on the way I use post_compile today. My use case for post_compile is as follows:

I have an e-commerce platform made with Django + Frontend application in React. The deployment process has integration with Sentry releases.

In Heroku Classic, you do not have access to GIT and repository, just an environment variable that informs which git commit hash of a particular deploy flow has started. You also do not have access to information when using pipelines and use functionality to promote to Production a Heroku Slug running on Staging

The application has 2 independent react applications (one uses old things and another uses new and incompatible things, and we are gradually migrating to the new application)

We use pre_compile for: 1 - Perform compilation tasks for the 2 react applications, so that everything is prepared for when python/django compilation tasks are performed 2 - Upload to Sentry releases of the Source Maps JS files generated in the previous step 3 - Since Heroku Classic has no direct access to the Git Repository, Sentry-Cli is not able to automatically get all the information needed to create a new release in Sentry Releases, so I must do so manually by running a custom script to consult the Sentry API to find out which one was the hash of last previous deployment made with successful status (That is, a finalized release) and then generate the range of commits to which this deploy possibly covers and manually starts the creation of a new release on Sentry Releases

post_compile:

1 - We perform Django's Collectstatic 2 - If all previous steps are executed sucessfully, we execute the Finalize release step of Sentry Releases to confirm that the application was successfully implanted. https://docs.sentry.io/product/cli/releases/#finalizing-releaseseses

edmorley commented 1 year ago

For CNBs, the standardised way of running custom commands before/after buildpacks is now using the new built-in inline buildpacks feature, which is documented here: https://buildpacks.io/docs/app-developer-guide/using-inline-buildpacks/

For language ecosystems where there is a standardised convention (for that language) way of running commands, I think it makes sense for the buildpacks for that language to keep supporting those conventions (for example, the Node.js buildpack supporting the package.json scripts feature for things like the build command etc).

However, for language ecosystems like Python where there is no such convention, I don't think it makes sense to preserve the classic Python buildpack's proprietary bin/{pre,post}_compile steps, given there is a built in feature for custom commands in the CNB spec (inline buildpacks).

As such, I think I may not end up adding support for this in the Python CNB. Though I may end up adding some explicit error handling to ease the transition. (Which could for example print an example inline buildpack config to the build log, which users can then copy directly into their project.toml.)

ipmb commented 1 year ago

I understand the motivation here, but project.toml feels like a big regression in DX/ergonomics.

Previously I could do something like this:

pack build myimage -b heroku/nodejs,heroku/python -B heroku/buildpacks:20

If I'm understanding the inline buildpacks, I now need to add a project.toml like this:

[_]
schema-version = "0.2"
id = "io.buildpacks.my-app"

[[io.buildpacks.group]]
id = "heroku/nodejs"
version = "1.1.2"

[[io.buildpacks.group]]
id = "heroku/python"
version = "0.4.0"

[[io.buildpacks.group]]
id = "me/post-compile"

  [io.buildpacks.group.script]
  api = "0.9"
  inline = "echo hello"

and then I can run:

pack build myimage -B heroku/builder:22

Versions seem required in project.toml so you lose auto-updating. It also requires more knowledge of buildpack internals (schema_version, api, io.buildpacks...).

Maybe this is outside of Heroku's scope, but a thought I had was that a separate "run-script" buildpack could be published which executes a script from a predefined location (or location provided by env var)?

edmorley commented 1 year ago

The buildpack version in project.toml is an optional field, and defaults to "latest", so auto-updating of buildpack versions won't be affected. See: https://buildpacks.io/docs/reference/config/project-descriptor/#iobuildpacks-_table-optional_

I agree that users may get the syntax of the project.toml file incorrect, however, we can and should document it clearly + have clear validation error messages (either upstream in the CNB lifecycle, or else in Pack CLI + the Heroku next gen build system).

Ref having to create a project.toml - in your example you would already have had to create such a file, since that app uses both Python and Node.js, and the CNB auto-detection would have only picked one language. (Just like currently Heroku only auto-detects a single language when no buildpacks are set, and for anything else you have to set explicit buildpacks using heroku buildpacks:set.)

Whilst the project.toml file won't be mandatory, it will be needed for any app that:

In addition, app.json may eventually end up being merged into project.toml, meaning that a project.toml file may be needed for the Heroku CI / Review App use-cases too.

As such, I think having a project.toml will be fairly common for anything but beginner apps. And beginner apps don't tend to use bin/{pre,post}_compile.

In general though, I much prefer having buildpacks/stacks be configured via code rather than heroku buildpacks:set / heroku stack:set, since implicit differing state between apps is a regular source of user confusion ("why does my staging app work but my production app not, I'm git pushing the same source to both" etc).

edmorley commented 1 year ago

a thought I had was that a separate "run-script" buildpack could be published which executes a script from a predefined location

Yeah this is another viable solution too.

Though perhaps a third option would be to ask for a simpler project.toml syntax for "run commands" that don't require the full verbosity of the current "inline buildpack" concept? https://github.com/buildpacks/rfcs#proposal

ipmb commented 1 year ago

The buildpack version in project.toml is an optional field, and defaults to "latest", so auto-updating of buildpack versions won't be affected.

Using the latest pack, I get this error if I remove the versions from the heroku buildpack groups:

$ pack build myimg -B heroku/builder:22
22: Pulling from heroku/builder
Digest: sha256:c9f45adaaf3f91eb2f126b29c8e947f1375bae560a725511393b6a197cfe765d
Status: Image is up to date for heroku/builder:22
22-cnb: Pulling from heroku/heroku
Digest: sha256:0c422f13416d589a199e3fbf9fa17c1a75b1a120fc60425cd6a20216c432f026
Status: Image is up to date for heroku/heroku:22-cnb
ERROR: failed to build: Invalid buildpack defined in project descriptor

If I use "latest" I get:

$ pack build myimg -B heroku/builder:22
22: Pulling from heroku/builder
Digest: sha256:c9f45adaaf3f91eb2f126b29c8e947f1375bae560a725511393b6a197cfe765d
Status: Image is up to date for heroku/builder:22
22-cnb: Pulling from heroku/heroku
Digest: sha256:0c422f13416d589a199e3fbf9fa17c1a75b1a120fc60425cd6a20216c432f026
Status: Image is up to date for heroku/heroku:22-cnb
ERROR: failed to build: downloading buildpack: error reading heroku/nodejs@latest: invalid locator: InvalidLocator

What am I missing?

ipmb commented 1 year ago

Thanks for the thoughtful write-up @edmorley! I see where you're coming from and the value in following a standard. If good examples are provided, the project.toml is bearable, but heroku buildpacks:set feels far more user friendly 🙂

edmorley commented 1 year ago

What am I missing?

~Oh strange! I know there are some heuristics around interpreting a plain URI identifier (ie: to work out if it's a Docker Hub reference, or say a CNB registry reference; xref https://buildpacks.io/docs/app-developer-guide/specify-buildpacks/#uri-examples), perhaps there is a bug around that and the default of latest? I personally haven't tried defining buildpacks in project.toml yet (we're still very early stages in the overall CNB / CNB tooling journey), so maybe the best place to ask for now might be https://github.com/buildpacks/pack/issues or https://slack.cncf.io/ (in one of the #buildpacks / #buildpacks-* channels).~

Edit: This bug was fixed upstream by https://github.com/buildpacks/pack/pull/1873.

but heroku buildpacks:set feels far more user friendly 🙂

I agree it is seemingly more friendly at first glance, but it actually does cause quite a few issues that likely do not surface for experienced/diligent users (such as yourself :-)) but often trip beginners up - such as:

In addition, with CNBs we finally have a much better story for running a build locally that matches production in the form of pack build etc. However, given that heroku buildpacks:set stores state remotely in Heroku's API/database for a specific app, how would pack build know what buildpacks to run, unless that's defined in the codebase?

ipmb commented 1 year ago

In case anyone else comes across this, a uri is required, not an id for it to work without a version:

[[io.buildpacks.group]]
uri = "urn:cnb:builder:heroku/nodejs"
edmorley commented 11 months ago

In case anyone else comes across this, a uri is required, not an id for it to work without a version:

@ipmb That seems like a bug - would you mind reporting it upstream? https://github.com/buildpacks/pack/issues

ipmb commented 11 months ago

✅ https://github.com/buildpacks/pack/issues/1862

edmorley commented 11 months ago

Thank you!

jose-fully-ported commented 8 months ago

As such, I think having a project.toml will be fairly common for anything but beginner apps. And beginner apps don't tend to use bin/{pre,post}_compile.

I've long used bin/{pre,post}_compile files in heroku apps to install binaries I needed in apps at runtime that were annoying to install otherwise. I don't think its something that is "optional" by any means, but maybe Heroku has telemetry on it's usage?

jose-fully-ported commented 8 months ago

Would it be best for someone (me, you, someone) to publishpre-compile and post-compile buildpacks that detects based on the existence of the respective files? I don't know if there is a way to control order here and wrap all other buildpacks other than specifying it in a particular order within project.toml (or --buildpack flag if using pack).

edmorley commented 8 months ago

The issue here is that bin/{pre_post}_compile are an entirely proprietary Heroku classic Python buildpack thing - they are not used by any other Heroku language buildpack, nor are they used by any other non-Heroku Python ecosystem tool.

IMO one of the big wins of the migration from classic buildpacks to CNBs (for all Heroku languages) is that of switching from proprietary features/concepts to open standards and modern best practices. For example:

For some of these transitions (for example runtime.txt), the Python CNB will support both the old and the new method - but will recommend the new method in docs and show a deprecation warning for the old approach in the build logs.

For some other differences between the classic Python buildpack and the CNB, the feature will be dropped and result in an error that explains how to migrate.

I haven't decided yet which approach to use for bin/{pre_post}_compile (hard error with migration advice, or deprecation warning) - however, I don't feel "support it as the preferred/recommended approach" (or replacing it with a custom pre-compile buildpack replacement) is even an option we should be considering as a best practice moving forwards.

Ultimately "run a custom command before/after one of my buildpacks" is not a Python specific feature - and should be something handled by the upstream CNB project (and already is - though there are likely UX improvements that could be made).

If you have suggestions for UX improvements to how the upstream inline buildpack feature work, I'd strongly recommend opening some issues or starting a discussion upstream: https://github.com/buildpacks/community/discussions https://github.com/buildpacks/rfcs

Ultimately end users being able to influence the design of buildpack APIs and tooling is another of the advantages of switching to an open standard - so please do take advantage of that! 😄

ipmb commented 8 months ago

It's not as pretty, but adding a project.toml like this now works and gives me a workaround for post_compile

[_]
schema-version = "0.2"
id = "io.buildpacks.my-app"

[[io.buildpacks.group]]
id = "heroku/nodejs"

[[io.buildpacks.group]]
id = "heroku/python"

[[io.buildpacks.group]]
id = "my-app/post-compile"

  [io.buildpacks.group.script]
  api = "0.9"
  inline = "bin/post_compile"

If your post_compile script is a shell script, this will get you a header that matches the others in the build output:

echo -e "\n\e[1;35m[post_compile]\e[0m"
edmorley commented 8 months ago

Thank you for the example - glad to see that works!

Using an inline table allows for reducing the boilerplate a bit more (depending on personal taste for TOML style):

[[io.buildpacks.group]]
id = "my-app/post-compile"
script = { api = "0.9", inline = "bin/post_compile" }

I'll open a PR against the upstream docs to make them use the inline table approach in the rake package example on: https://buildpacks.io/docs/app-developer-guide/using-inline-buildpacks/

edmorley commented 2 months ago

@ipmb Hi! I happened to see: https://apppack.io/blog/using-bin-post-compile-in-heroku-python-cnb-buildpacks/

In that you say:

I hope that in the future, the Python buildpack regains this functionality similar to how build scripts are handled by the Node.js buildpacks. Part of the joy of working with buildpacks is that "things just work" without needing to understand all the internals. Adding a project.toml file breaks that illusion.

I'm not sure the comparison to Node.js build scripts under the package.json scripts.build key makes sense. That feature is based around an ecosystem convention of (a) having script aliases under scripts in general (and mentioned in the spec for package.json), (b) it being very common for the build step's scripts entry to be called build and common tools (such as create-react-app) populating the build entry by default. It's those aspects that are why it makes a lot of sense for the Node.js buildpack to support this language convention since many apps will just work out of the box on Heroku. Plus if you were to ask an average Node.js developer how to perform a production build of their Node.js app, I suspect many of them would know about {npm,yarn} run build, since it's an existing/understood language concept.

However, the same cannot be said for Python, where there is (a) no built-in way to specify command aliases in general (even for use-cases like an app's custom internal "lint" command etc), (b) no common convention for what to call the "build" step or how to run it. The closest one can get to that is tool-proprietary features like Poetry's run alias feature, however, (a) not all package managers/tools support something like that, (b) for those that do, each tool uses a different way of specifying the list of aliases, (c) there isn't a standardised command alias naming convention for the step that should be run "for production builds, after package install" etc, (d) the scaffolding tools and example templates for popular frameworks don't configure/use these aliases (understandably, since there is no single way to support all the tools).

Perhaps in the future pyproject.toml might support a standardised app scripts/commands alias section with a convention for a specific command alias for post-install commands (if you're feeling adventurous, you could start a PEP?), at which point I would be happy to support that too.

In the meantime, the CNB inline buildpack feature seems like a pretty good standard to migrate people to, given that with CNBs we now actually have a standard (and not just some adhoc "run" classic buildpacks). (I'd also question the blog post calling it "internals" - buildpacks are part of the public API to users, and project.toml and/or the inline buildpack feature are public features of CNBs.)

However, I definitely want the need to migrate away from {pre,post}_compile to be discoverable, so will add some warning/error messages. (Doing this is in my queue after the a few other higher priority things, like Heroku-24, multi-arch support, migrating to Buildpack API 0.10/latest libcnb.rs etc).

ipmb commented 2 months ago

I hear what you're saying... Node definitely has more of a standard here, but I don't think Python is so far off that you couldn't do something similar.

The Node buildpack already supports non-standard Heroku-specific keys to run scripts (heroku-postbuild, heroku-prebuild`, etc). So there is precedence in not just using the standard, but defining something that is used specifically by Heroku buildpacks.

There may be a little debate on it still, but pyproject.toml is the package.json of the Python world. It has a standard way for arbitrary tools to define configuration within it.

My take is that it would be much more ergonomic for Python developers to define the scripts using pyproject.toml (something they are probably already familiar with), than having to understand some of the internals of buildpacks to perform a common task. One of the huge benefits of buildpacks is that developers never have to leave their ecosystem. They use buildpacks because they don't want to learn how to use Docker. This breaks that model. For somebody who doesn't know buildpacks, it's confusing, non-intuitive, and error-prone.

Something like this in pyproject.toml would be superior from the end-user perspective and is very similar to how the heroku-* values work for Node:

[tool.heroku-buildpack]
postbuild = "./bin/post-compile"

Yes, there is no python run for that, but it is a very tiny feature for the buildpack to parse the TOML and execute the command if it exists.

I can see this is probably going to be an agree-to-disagree situation, just trying to clarify my stance here :)