splor-mg / dados-sigplan-planejamento

Conjunto de dados do PPAG Planejamento
0 stars 0 forks source link

Configurar cache para o docker na execução dos workflows do github actions #34

Closed fjuniorr closed 1 year ago

fjuniorr commented 1 year ago

Na conta GitHub Free utilizada pela organização SPLOR temos como limite no Github Actions 500 MB de storage e 2000 minutos de job execution time.

Se cada job demorar 10 minutos (os do PPAG tem demorado 6m), podemos executar 200 jobs por mês. Isso significa 6 conjuntos de dados que possuem atualização diária (6x30x10 = 1800).

fjuniorr commented 1 year ago

O cache funcionou! Aqui a explicação do chatGPT para cada linha:

  • Set up Docker Buildx:

    - name: Set up Docker Buildx
    uses: docker/setup-buildx-action@v2

    This step sets up Docker Buildx in the GitHub Actions environment. Docker Buildx is a CLI plugin that extends the default Docker build command with the full support of the features provided by Moby BuildKit builder toolkit. It provides the same user experience as docker build and adds some additional capabilities like building against multiple nodes concurrently, support for new Dockerfile features (like multi-stage builds), or the ability to create output in different formats (like local directory or docker tar).

  • Login to DockerHub:

    - name: Login to DockerHub
    uses: docker/login-action@v2
    with:
      username: ${{ secrets.DOCKERHUB_USERNAME }}
      password: ${{ secrets.DOCKERHUB_TOKEN }}

    This step logs into DockerHub using the provided username and password (which are stored as GitHub secrets). This is necessary because you will later be pushing an image to DockerHub, and therefore authentication is required. Secrets are encrypted and can only be accessed by the workflows running in the same repository, providing a secure way to store sensitive information like your DockerHub credentials.

  • Build and push Docker image:

    - name: Build and push Docker image
    uses: docker/build-push-action@v2
    with:
      context: .
      push: true
      tags: fjuniorr/ppag-planejamento:latest
      cache-from: type=registry,ref=fjuniorr/ppag-planejamento:latest
      cache-to: type=inline

    This step builds a Docker image and pushes it to DockerHub.

    • The context: . instruction sets the build context to the current directory. In Docker, the build context is the set of files and directories that are available to the Dockerfile during the build process.

    • The push: true instruction pushes the built image to DockerHub.

    • The tags: fjuniorr/ppag-planejamento:latest instruction tags the built image with the tag fjuniorr/ppag-planejamento:latest. Tags are a way to identify specific versions of an image.

    • The cache-from: type=registry,ref=fjuniorr/ppag-planejamento:latest instruction tells Docker to use the specified image (fjuniorr/ppag-planejamento:latest) as a cache source when building the image. This can speed up the build process by reusing intermediate layers from the cached image instead of re-running all the Dockerfile steps.

    • The cache-to: type=inline instruction specifies that the cache should be stored inline in the Docker image itself. This can speed up subsequent builds by allowing them to reuse cached layers.

In summary, this workflow logs into DockerHub, sets up Docker Buildx, builds a Docker image using the current directory as the build context, caches the build to speed up future builds, tags the image, and pushes the image to DockerHub.

E os links para as actions utilizadas:

Eu ainda estou confuso em relação ao funcionamento com múltiplos repositórios que estejam fazendo push no dockerhub para a mesma imagem.

fjuniorr commented 1 year ago

Eu ainda estou confuso em relação ao funcionamento com múltiplos repositórios que estejam fazendo push no dockerhub para a mesma imagem.

Alterei o workflow para que cada repositório faça push para o dockerhub da sua própria imagem no commit https://github.com/splor-mg/ppag-planejamento/commit/1510f3f1640de96b7016c14a57142c002e4d06e5. A explicação do que iria acontecer:

Explain me again what will happen if I have two repositories that are running this exact same workflow

When you have two different repositories running the same workflow, each repository will have its own set of GitHub Actions that run independently of the other. Even though the workflows are the same, the context (repository) in which they run is different.

Let's go through the process for both repositories:

  1. Set up Docker Buildx: Each repository will set up its own instance of Docker Buildx in their respective GitHub Actions runners. They do not share this setup or its resultant data.

  2. Login to DockerHub: Each repository will log in to DockerHub with its own set of credentials. These could be the same or different, depending on the secrets (DOCKERHUB_USERNAME and DOCKERHUB_TOKEN) you have set up in each repository.

  3. Build and push Docker image: Here is where the sharing can occur. Both repositories will attempt to build the Docker image and push it to the DockerHub registry with the tag fjuniorr/ppag-planejamento:latest. This is where potential conflicts could arise, particularly if both workflows are running concurrently. Whichever workflow pushes the image last will effectively "win," as it will overwrite the image from the other workflow.

    • The cache-from: type=registry,ref=fjuniorr/ppag-planejamento:latest directive tells Docker to pull the fjuniorr/ppag-planejamento:latest image from the DockerHub registry and use its layers as a cache for the build process. If both repositories are using this same image as a cache source, they will effectively share this cache, but only in the sense that they are both pulling the same image from DockerHub to use as a cache source.

    • The cache-to: type=inline directive tells Docker to store the cache inline in the Docker image itself. This cache will then be pushed to DockerHub along with the image. When the other workflow pulls this image to use as a cache source, it will pull the cache along with it. However, the cache from each repository's workflow run is not shared directly with each other, but indirectly through the image on DockerHub.

In summary, while each repository will run its workflow independently, they can interact through the Docker image and its associated cache stored on DockerHub. However, they do not share their individual workflow data directly with each other. And please remember, if both workflows are running at the same time, there could be conflicts or overwriting issues with the Docker image on DockerHub.

Per the current configuration in your workflow, every time the workflow runs in either repository, it will build a new Docker image and push it to DockerHub with the tag fjuniorr/ppag-planejamento:latest. The :latest tag is a mutable tag, and the image associated with it will be overwritten every time a new image is pushed with the same tag.

Therefore, if you're running the same workflow in different repositories, they can indeed overwrite each other's Docker image on DockerHub if they're both using the same DockerHub account and image tag.

If you want to avoid this, you could consider using unique tags for each repository. One common practice is to include the commit SHA or a timestamp in the image tag, which would look something like this: fjuniorr/ppag-planejamento:${{ github.sha }} or fjuniorr/ppag-planejamento:$(date +%s). This would ensure that each workflow push creates a uniquely tagged image, rather than overwriting the latest tag.

Just remember that with this approach, you'll need a strategy for cleaning up old images on DockerHub if you're running these workflows frequently, as DockerHub has storage limits for accounts.

-- chatGPT