simplesurance / baur

An incremental task runner for mono repositories.
GNU General Public License v2.0
362 stars 11 forks source link

How to pass data from build script to `Task.Output.DockerImage.RegistryUpload` #289

Open kostyay opened 3 years ago

kostyay commented 3 years ago

Use case: I have a build script that creates a docker image using docker build. It decides the image tag/repo based on some parameters considered by the build script.

Build script:

# do stuff
# DOCKER_IMAGE_REPOSITORY env variable contains the image repository into which the image to be uploaded
export DOCKER_IMAGE_REPOSITORY=mycomp/image
# DOCKER_IMAGE_TAG env variable that contains the tag for the image
export DOCKER_IMAGE_TAG=v1.0.0

The docker output is configured in the following way:

        [[Task.Output.DockerImage.RegistryUpload]]

        # Registry address in the format <HOST>:[<PORT>]. If it's empty the docker agent's default is used.
        registry = ""
        repository = "{{ env DOCKER_IMAGE_REPOSITORY }}"
        tag = "{{ env DOCKER_IMAGE_TAG }}"

I need to somehow pass parameters into the repository and tag values as they are not constant and generated by the build script. How can I do that? I tried setting env variables in the build script, but they are not passed down into the RegistryUpload step.

Thanks

fho commented 3 years ago

Passing data that is generated during command execution to the upload steps is unfortunately not possible currently. Only environment variables are accessible that were defined when the baur command was run. The variable are also resolved before the command is executed.

I like the idea a lot, being able to pass data from the command step to the upload step would be very valuable.

kostyay commented 3 years ago

I want to implement this as its very handy to us. Do you have any preferred way for this to be implemented? I was thinking perhaps build script can set env variables with a prefix and baur will parse them and make them available via helper function like env build.sh

export STEP_VAR_V1=f00
export STEP_VAR_V2=f111

inside the config can do

{{ stepOutput "V1" }}

wdyt?

fho commented 3 years ago

I want to implement this as its very handy to us.

That would be great :-)

Do you have any preferred way for this to be implemented? I was thinking perhaps build script can set env variables with a prefix and baur will parse them and make them available via helper function like env [..]

If I understand it right the idea is that the executed command sets environment variables, baur reads them, stores them and later templates the upload output config with it. Afaik it is not possible to read the environment variables that the process set. Environment variables are available to the process and it's child processes. Baur executes the command as a new process, when the process terminates the environment variables that it set are gone and baur can not retrieve them. Or am I wrong here?

Another way to pass data would be that the command writes the data to a file. The name of the file to that data must be written is defined by baur and could be made available as environment variable to the command. When the command terminates, baur reads and parses the file and makes the information available to the upload step. The file could be in json format, baur unmarshals it and the datastructure can probably be made accessible via gotemplates. We would also have to change how baur currently replaces variables (does templating). Currently configuration files are read, parsed, and variables are replaced before the command runs. We do not have the information from the command execution yet to to template the upload sections, so we have to template some variables or section as we now do and others after the command ran. I think it should work to simply template the whole output sections later, then we do not have distinguish between which variables we template in which phases.

It also came to my mind that what you want to do could be realized in a different way. baur could support custom uploaders instead of only S3 and docker registries. A custom uploader would be a command that baur runs to upload an artifact. When baur invokes the custom uploader it passed information like the filepath/image-id to upload, appname, taskname, etc. The command does what it want and prints the URI ) where the output was uploaded to (maybe also some more information) or an error message. The URI that the command prints will be stored in the baur database for the task run. This would allow to store artifacts wherever and however the user wants. In your particular case you could have a custom uploader that reads information from some file that you previous created when the task-command was run and then does a docker push.

What do you think?

kostyay commented 3 years ago

I want to implement this as its very handy to us.

That would be great :-)

Do you have any preferred way for this to be implemented? I was thinking perhaps build script can set env variables with a prefix and baur will parse them and make them available via helper function like env [..]

If I understand it right the idea is that the executed command sets environment variables, baur reads them, stores them and later templates the upload output config with it. Afaik it is not possible to read the environment variables that the process set. Environment variables are available to the process and it's child processes. Baur executes the command as a new process, when the process terminates the environment variables that it set are gone and baur can not retrieve them. Or am I wrong here? I just validated this, you are correct so this use case is not possible.

Another way to pass data would be that the command writes the data to a file. The name of the file to that data must be written is defined by baur and could be made available as environment variable to the command. When the command terminates, baur reads and parses the file and makes the information available to the upload step. The file could be in json format, baur unmarshals it and the datastructure can probably be made accessible via gotemplates. We would also have to change how baur currently replaces variables (does templating). Currently configuration files are read, parsed, and variables are replaced before the command runs. We do not have the information from the command execution yet to to template the upload sections, so we have to template some variables or section as we now do and others after the command ran. I think it should work to simply template the whole output sections later, then we do not have distinguish between which variables we template in which phases.

This is a good idea. Before executing a command baur can generate a temporary file path and make it available to the build command either via template variable or env var. The build step will write JSON output to this file and baur will unmarshal it as JSON. The unmarshalled data will be available to via template variables to the upload steps.

Example flow:

  1. Generate build output temp file, available via {{ .buildResultsJsonPath }} or BUILD_RESULTS_PATH env variable
  2. Build script can either accept output as cli argument myscript.py {{ .buildResultsJsonPath }} or using the env var.
  3. After execution baur will read the file and try to parse as json.
  4. The JSON should be only 1 level nesting (key: value) and will be unmarshalled into map[string]string{}
  5. There will be a template function via gotemplate using which u can resolve data from the output

It also came to my mind that what you want to do could be realized in a different way. baur could support custom uploaders instead of only S3 and docker registries. A custom uploader would be a command that baur runs to upload an artifact. When baur invokes the custom uploader it passed information like the filepath/image-id to upload, appname, taskname, etc. The command does what it want and prints the URI ) where the output was uploaded to (maybe also some more information) or an error message. The URI that the command prints will be stored in the baur database for the task run. This would allow to store artifacts wherever and however the user wants. In your particular case you could have a custom uploader that reads information from some file that you previous created when the task-command was run and then does a docker push.

What do you think?

This is a good idea, and I think will be useful in general. A generic uploaded that is just a script execution and returns some output that is used as the artifact for the step. We are personally currently interested in the step output as json you proposed, so we can start with that. The generic uploaded is also a good idea, but less important for us at this point as we are uploading docker images as the output of our build.

fho commented 3 years ago

sounds great :+1:, some comments:

Example flow:

1. Generate build output temp file, available via `{{ .buildResultsJsonPath }}` or `BUILD_RESULTS_PATH` env variable

Should be sufficient to only generate a unique filename. The file can be created by the executed command itself.

2. Build script can either accept output as cli argument `myscript.py {{ .buildResultsJsonPath }}` or using the env var.

It should be easier for the first implementation to only support specifying the environment variable via the output file. If we later still want to support {{ .buildResultsJsonPath }} we can do it in a follow-up PR.

Replacing the variables for the command is currently done after the config was parsed. We would have to generate the filename when the configs are loaded, which is the wrong stage. Alternatively we would have to totally refactor when variables a replaced. :-)

4. The JSON should be only 1 level nesting (key: value) and will be unmarshalled into `map[string]string{}`

Are you sure that it is not possible to query via gotemplate statements nested map[string]interface{} types? If it would be possible we would not need the 1 lvl nesting limitation.

kostyay commented 3 years ago

Good points, simplify the implementation.

Are you sure that it is not possible to query via gotemplate statements nested map[string]interface{} types? If it would be possible we would not need the 1 lvl nesting limitation.

You are correct, gotemplate supports multilevel map[string]interface{} so that will also work.

I will look at the code and see how hard its to implement something like this.