pressly / goose

A database migration tool. Supports SQL migrations and Go functions.
http://pressly.github.io/goose/
Other
6.48k stars 502 forks source link

feature: environment variable interpolation #347

Closed shellscape closed 8 months ago

shellscape commented 2 years ago

Ahoy hoy. We've got a unique situation that cropped up in which we have to add a bunch of triggers. These triggers are tied to AWS lambdas, and as such require some configuration based on the account and environment/stage they're deployed to. Utilizing the goose migrations came to mind as a viable pipeline for adding the triggers, but we'd need to be able to use environment variables within the SQL in the migration.sql files themselves. Not sure if this would be a worthwhile add, given that many probably use migration.go files, but we're a Node shop using .sql. It sure would be useful.

mfridman commented 2 years ago

I'm intrigued more than anything, what does it mean to set env variables in .sql files?

If I understand correctly, goose would parse some .sql file, maybe a special annotation -- +goose ENV=PRODUCTION and then do something with this env variable?

shellscape commented 2 years ago

An annotation might work, but we're really interested in the environment variables set on the CLI. This came up because we have a need to create triggers with environment specific lambda function names. Our migrations are run with specific environment variables set based on the target deployment environment (dev/stage/prod/etc), and we were looking for the least overhead with setting up that trigger for each deployment environment. Envar interpolation in the migrations was the ideal, lowest overhead approach in our estimation (and we looked at a lot of approaches). Here's the gist:

DEPLOY_ENV=stage goose -allow-missing=true -dir infra/migrations up

In our migration file:

CREATE TRIGGER new_user_trigger
  AFTER INSERT ON users
  FOR EACH ROW
  EXECUTE PROCEDURE respond_with_lambda("${{DEPLOY_ENV}}-trigger-new-user", "us-east-2");

Now, ${{DEPLOY_ENV}} is just a silly notation I threw in there for demonstrative purposes. But that's the idea. Because our migrations are run for specific environments, and our individual databases don't talk to anything outside of its environment (e.g. stage db doesn't talk to prod assets and vice versa) this would be a straightforward means to create db resources on a per-env basis.

shellscape commented 2 years ago

I've dipped my toes into golang and have a working fork that uses https://github.com/cbroglie/mustache to replace environment variables within SQL that goose runs. Would you accept a PR for this feature?

jonseymour commented 1 year ago

I also need something like this.

FWIW: something that would work for me, because I am only interested in postgres, would be to have a way of passing the equivalent of psql -v options (or \set commands) from the goose command line so that references of the form :{MY_ENV_VAR} within the script would resolve properly. Obviously, this isn't applicable for drivers other than the postgres driver and so a more generic solution per the one above might work.

More generally, the ability to execute some kind of preamble within and prior to the migration would also be sufficient to allow me to set up the variables without needing any kind of variable interpolation by goose itself.

mfridman commented 1 year ago

I was hesitant to add this feature before refactoring and extending the SQL parser, but that was done in v3.9.0, so I can see how this feature could be useful.

I'm not sure we need a full-fledged mustache library, and a simple parameter expansion package with a few rules would suffice for the majority of use cases.

Here's a few that come to mind:

Ideally, we support a very simple and limited set and it's generic enough to work with any .sql file.

For CLI users, we'll probably also want to add -e flag to pass ENV variables, similar to the docker CLI.

Edit, related previous issue: https://github.com/pressly/goose/issues/329

shellscape commented 1 year ago

I would suggest something akin to https://github.com/joho/godotenv as it's become industry standard to use .env files. The -e flag is more for one-offs and overrides.

mfridman commented 1 year ago

Ye that's fair, ignore the -e flag idea.

SophisticaSean commented 1 year ago

Would also love to see this feature!

rdpascua commented 1 year ago

Reading something from .env, .env.local would be a great feature!

smoke commented 10 months ago

I needed both so implemented the following PRs:

Both features are now released as https://github.com/smoke/goose/releases/tag/v3.16.0-envsubst.1, until the PRs are merged and released.

mfridman commented 9 months ago

Alright, coming back to this issue. It seems there are at least 2 distinct requests:

  1. Enable goose to read from .env files
  2. Add variable interpolation support in .sql files

@shellscape @jonseymour Based on your comments, do you need both, or would 2. be sufficient?

shellscape commented 9 months ago

I can get by with #2.

mfridman commented 8 months ago

Alright, I re-opened https://github.com/pressly/goose/issues/329 so we can separate the discussion because there are at least 2 discussions here:

  1. Adding variable replacement, i.e., ${var} (this issue)
  2. Reading env variables from external files (#329 issue)

I've done some preliminary work to evaluate the various libraries out there and have settled on one of these four (in order):

  1. https://pkg.go.dev/mvdan.cc/sh/v3/expand (well maintained,high quality code, but slightly confusing API)
  2. https://github.com/buildkite/interpolate
  3. https://pkg.go.dev/github.com/moby/buildkit@v0.12.4/frontend/dockerfile/shell (well maintained, but specific to Docker needs)
  4. https://github.com/drone/envsubst (not well maintained, Drone was acquired and maintenance is questionable)
mfridman commented 8 months ago

@shellscape, @jonseymour maybe one more question. For backwards compatibility, and to insulate users from unexpected behaviour who aren't paying attention, would an opt-in approach suffice?

  1. A +goose annotation, e.g., -- +goose ENVREPLACE (ON|OFF) per file
  2. A flag, goose -envreplace=on ..., to enable env replacement for all files

Not sure about the "env replace" naming, open to suggestions. @smoke used ENVSUBST which wasn't bad either. Maybe ENVSUB is short and simple.

mfridman commented 8 months ago

This feature was recently implemented (kudos to @smoke) and will eventually be released (likely v3.18.0 and above).

The best place to see more context is either the CHANGELOG or README.

Quick summary

To enable this functionality, developers can use the -- +goose ENVSUB ON annotation before the SQL queries where they want substitution to be applied. This feature remains active until the end of the file, or until the -- +goose ENVSUB OFF annotation is encountered, and these annotations can be used multiple times within a file.

-- +goose ENVSUB ON
-- +goose Up
SELECT * FROM ${VAR};

By default, this feature is disabled for backward compatibility with existing scripts. For cases like PL/pgSQL functions or other statements where substitution is not desired, developers can explicitly wrap the annotations around the relevant parts.

Otherwise the RETURNS void AS $$ will be escaped to RETURNS void AS $, since $$ has special meaning in most variable interpolation libraries.

-- +goose StatementBegin
CREATE OR REPLACE FUNCTION test_func()
RETURNS void AS $$
-- +goose ENVSUB ON
BEGIN
    RAISE NOTICE '${VAR}';
END;
-- +goose ENVSUB OFF
$$ LANGUAGE plpgsql;
-- +goose StatementEnd

See the following for a list of supported expansions. I forked this repository to have a bit more control of the underlying library. Based on user feedback, we can revisit whether this library is sufficient or we need to support additional expressions.