pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
126 stars 34 forks source link

Non deterministic ContainerGroup building #2580

Open reubano opened 1 year ago

reubano commented 1 year ago

What happened?

I'm creating an Azure ContainerGroup with associated StorageAccount and FileShares. I've found that the pulumi up command often times out. However, running pulumi refresh and then pulumi up again works.

When it fails I get the following errors

{
  "message": "pulling image \"nginx@sha256:1bb5c4b86cb7c1e9f0209611dc2135d8a2c1c3a6436163970c99193787d067ea\";Successfully pulled image \"nginx@sha256:1bb5c4b86cb7c1e9f0209611dc2135d8a2c1c3a6436163970c99193787d067ea\";Error: Failed to start container nginxcontainerdev, Error response: to create containerd task: failed to create shim task: failed to create container ad39cc812d771fde7d472850c8d513a90c8ea33da52720d11880b43a2a7f9d33: guest RPC failure: failed to create container: failed to run runc create/exec call for container ad39cc812d771fde7d472850c8d513a90c8ea33da52720d11880b43a2a7f9d33 with exit status 1: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting \"/run/gcs/c/03e3f9834193e26e56ddeaa500f4d66e7b1c74b82a2d5be71830c1b33a418cc7/sandboxMounts/tmp/atlas/azureFileVolume/caas-62e0e817f230407d95887c5a5076786e/nginx-config/mnt\" to rootfs at \"/etc/nginx/nginx.conf\" caused: mount through procfd: not a directory: unknown;pulling image \"docker.io/bitnami/prometheus@sha256:4653c5d14a0904ae18a50f2907c7976f1160ed1be61fc322045061d51542419b\";Successfully pulled image \"docker.io/bitnami/prometheus@sha256:4653c5d14a0904ae18a50f2907c7976f1160ed1be61fc322045061d51542419b\";Started container;Killing container with id db071db65f81c6355713890a4b76949a1ce88895e46d7f17c82dfa48fd1113f7.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Subscription deployment didn't reach a successful provisioning state after '00:30:00'."
}

or

{
  "message": "Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Successfully mounted Azure File Volume.;Subscription deployment didn't reach a successful provisioning state after '00:30:00'."
}

Also, the publicIP export is sometimes not present in the output.

Expected Behavior

No errors. Or at least an indication as to why rerunning pulumi up makes it work.

Steps to reproduce

Pulumi.yaml

name: nginx-py
runtime:
  name: python
  options:
    virtualenv: venv
description: A minimal Azure Native Python Pulumi program
config:
  location: centralus

Pulumi.dev.yaml

config:
  nginx-py:env: Dev
  nginx-py:resourceGroup:
    name: xxx
    tags:
      CostCenter: xxx
      Environment: xxx
  nginx-py:vouch:
    clientID: xxx
    clientSecret:
      secure: xxx
    tenantID: xxx

__main__.py

from pulumi import Output, Config, ResourceOptions, export

from pulumi_azure_native import storage
from pulumi_azure_native.storage import StorageAccount, FileShare
from pulumi_azure_native.containerinstance import ContainerGroup, VolumeMountArgs, ContainerArgs
from pulumi_azure_native.resources import ResourceGroup

config = Config()
location = config.require("location")
env = config.require("env")
lowered_env = env.lower()
resource_group_config = config.require_object("resourceGroup")
vouch_config = config.require_object("vouch")

promtheus_subdomain = f"prometheusweb-{lowered_env}"
promtheus_base = "centralus.cloudapp.azure.com"
promtheus_domain = f"{promtheus_subdomain}.{promtheus_base}"

resource_group = ResourceGroup(
    f"xxx{env}",
    opts=ResourceOptions(protect=True),
    location=location,
    resource_group_name=resource_group_config["name"],
    tags=resource_group_config["tags"],
)

storage_account = StorageAccount(
    f"pulumiStorage{env}",
    minimum_tls_version="TLS1_2",
    account_name=f"pulumistorageacct{lowered_env}",
    allow_blob_public_access=False,
    resource_group_name=resource_group.name,
    sku=storage.SkuArgs(name=storage.SkuName.STANDARD_LRS),
    kind=storage.Kind.STORAGE_V2,
)

def create_file_share(name, quota=1, protect=False):
    return FileShare(
        f"{name}fileshare{lowered_env}",
        opts=ResourceOptions(protect=protect),
        account_name=storage_account.name,
        resource_group_name=resource_group.name,
        share_quota=quota
    )

nginx_config_fileshare = create_file_share("nginxconfig")
nginx_templates_fileshare = create_file_share("nginxtemplates")
vouch_config_fileshare = create_file_share("vouchconfig")
prom_data_fileshare = create_file_share("prometheusdata", 5, protect=True)
prom_config_fileshare = create_file_share("prometheusconfig")
am_data_fileshare = create_file_share("alertmanagerdata", 5, protect=True)
am_config_fileshare = create_file_share("alertmanagerconfig")

primary_storage_account_key = Output.secret(
    Output.all(resource_group.name, storage_account.name).apply(
        lambda args: storage.list_storage_account_keys(
            resource_group_name=args[0], account_name=args[1]
        )
    ).apply(lambda keys: keys.keys[0].value)
)

def get_file_share_config(name, read_only=False):
    return {
        "share_name": name,
        "storage_account_name": storage_account.name,
        "read_only": read_only,
        "storage_account_key": primary_storage_account_key
    }

# https://hub.docker.com/_/nginx
nginx_container = ContainerArgs(
    name=f"nginxcontainer{lowered_env}",
    image="nginx",
    resources={"requests": {"memory_in_gb": 1.5, "cpu": 1}},
    ports=[{"port": 80}, {"port": 443}],
    volume_mounts=[
        VolumeMountArgs(mount_path="/etc/nginx/templates", name="nginx-templates", read_only=False),
        VolumeMountArgs(mount_path="/etc/nginx/nginx.conf", name="nginx-config", read_only=True),
    ]
)

# https://github.com/vouch/vouch-proxy#running-from-docker
vouch_container = ContainerArgs(
    name=f"vouchcontainer{lowered_env}",
    image="quay.io/vouch/vouch-proxy:latest",
    resources={"requests": {"memory_in_gb": 1.5, "cpu": 1}},
    ports=[{"port": 9091}],
    volume_mounts=[
        VolumeMountArgs(mount_path="/config/config.yml", name="vouch-config", read_only=False),
    ]
)

# https://hub.docker.com/r/bitnami/prometheus
prom_container = ContainerArgs(
    name=f"promcontainer{lowered_env}",
    image="docker.io/bitnami/prometheus:latest",
    resources={"requests": {"memory_in_gb": 1.5, "cpu": 1}},
    ports=[{"port": 9090}],
    volume_mounts=[
        VolumeMountArgs(
            mount_path="/opt/bitnami/prometheus/data",
            name="prometheus-data",
            read_only=False,
        ),
        VolumeMountArgs(
            mount_path="/opt/bitnami/prometheus/conf/prometheus.yml",
            name="prometheus-config",
            read_only=False,
        ),
    ]
)

# https://hub.docker.com/r/bitnami/alertmanager
am_container = ContainerArgs(
    name=f"amcontainer{lowered_env}",
    image="docker.io/bitnami/alertmanager:latest",
    resources={"requests": {"memory_in_gb": 1.5, "cpu": 1}},
    ports=[{"port": 9093}],
    volume_mounts=[
        VolumeMountArgs(
            mount_path="/opt/bitnami/alertmanager/data",
            name="alertmanager-data",
            read_only=False,

        ),
        VolumeMountArgs(
            mount_path="/opt/bitnami/alertmanager/conf/config.yml",
            name="alertmanager-config",
            read_only=False,
        ),
    ]
)

container_volumes = [
    {"name": "vouch-config", "azure_file": get_file_share_config(vouch_config_fileshare.name)},
    {"name": "prometheus-data", "azure_file": get_file_share_config(prom_data_fileshare.name)},
    {"name": "prometheus-config", "azure_file": get_file_share_config(prom_config_fileshare.name)},
    {"name": "alertmanager-data", "azure_file": get_file_share_config(am_data_fileshare.name)},
    {"name": "alertmanager-config", "azure_file": get_file_share_config(am_config_fileshare.name)},
    {"name": "nginx-config", "azure_file": get_file_share_config(nginx_config_fileshare.name)},
    {"name": "nginx-templates", "azure_file": get_file_share_config(nginx_templates_fileshare.name)},
]

container_group = ContainerGroup(
    f"prometheusContainerGroup{env}",
    containers=[
        nginx_container,
        vouch_container,
        prom_container,
        am_container,
    ],
    ip_address={"ports": [{"port": 80}], "type": "Public"},
    os_type="Linux",
    resource_group_name=resource_group.name,
    container_group_name=f"prometheuscontainergroup{lowered_env}",
    location=resource_group.location,
    volumes=container_volumes
)

export("publicIP", container_group.ip_address.ip)
export("primaryStorageKey", primary_storage_account_key)

Output of pulumi about

CLI
Version      3.75.0
Go Version   go1.20.6
Go Compiler  gc

Plugins
NAME          VERSION
azure-native  1.103.0
docker        4.3.0
python        unknown

Host
OS       darwin
Version  12.6.1
Arch     arm64

This project is written in python: executable='.../venv/bin/python3' version='3.11.4'

Current Stack: xxx/nginx-py/dev

TYPE                                           URN
pulumi:pulumi:Stack                            urn:pulumi:dev::nginx-py::pulumi:pulumi:Stack::nginx-py-dev
pulumi:providers:azure-native                  urn:pulumi:dev::nginx-py::pulumi:providers:azure-native::default_1_103_0
azure-native:resources:ResourceGroup           urn:pulumi:dev::nginx-py::azure-native:resources:ResourceGroup::xxx
azure-native:storage:StorageAccount            urn:pulumi:dev::nginx-py::azure-native:storage:StorageAccount::pulumiStorageDev
pulumi:providers:azure-native                  urn:pulumi:dev::nginx-py::pulumi:providers:azure-native::default
azure-native:storage:FileShare                 urn:pulumi:dev::nginx-py::azure-native:storage:FileShare::nginxtemplatesfilesharedev
azure-native:storage:FileShare                 urn:pulumi:dev::nginx-py::azure-native:storage:FileShare::nginxconfigfilesharedev
azure-native:storage:FileShare                 urn:pulumi:dev::nginx-py::azure-native:storage:FileShare::vouchconfigfilesharedev
azure-native:storage:FileShare                 urn:pulumi:dev::nginx-py::azure-native:storage:FileShare::prometheusdatafilesharedev
azure-native:storage:FileShare                 urn:pulumi:dev::nginx-py::azure-native:storage:FileShare::alertmanagerdatafilesharedev
azure-native:storage:FileShare                 urn:pulumi:dev::nginx-py::azure-native:storage:FileShare::prometheusconfigfilesharedev
azure-native:storage:FileShare                 urn:pulumi:dev::nginx-py::azure-native:storage:FileShare::alertmanagerconfigfilesharedev
azure-native:containerinstance:ContainerGroup  urn:pulumi:dev::nginx-py::azure-native:containerinstance:ContainerGroup::prometheusContainerGroupDev

Found no pending operations associated with xxx/dev

Backend
Name           pulumi.com
URL            https://app.pulumi.com/xxx
User           xxx
Organizations  xxx, xxx

Dependencies:
NAME                 VERSION
pip                  23.1.2
pulumi-azure-native  1.103.0
pulumi-docker        4.3.0
setuptools           68.0.0
wheel                0.40.0

Pulumi locates its logs in /var/folders/m1/xxx/T/ by default

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

reubano commented 1 year ago

Some additional info. I've figured out that, unlike docker, Azure Containers can't mount a file, they can only mount directories. That means you can't do something like VolumeMountArgs(mount_path="/etc/nginx/nginx.conf", name="nginx-config", read_only=True). You would have to do VolumeMountArgs(mount_path="/etc/nginx", name="nginx", read_only=True) and then either upload nginx.conf via the az CLI or use a secret volume to store the b64 encoded file contents.

A good idea would be for pulumi to check if you are trying to mount a file and bail out via error vs waiting 30m for azure to timeout.

thomas11 commented 1 year ago

Hi @reubano, thank you for the update and the detailed initial report! Do I understand correctly that the problem does not occur when you mount a directory?

We'll take your idea of a local check into account, thank you!