Closed 0x2b3bfa0 closed 2 years ago
odd since I have that command run just fine during the provisioning of instances. Perhaps there is a mismatch on cloud providers; it is most definitely a command on GCP, for this was introduced to prevent the creation of the machine from timing out. where this was taking 7-15mins to download
I can grep the syslog for the transient service see that is has no issue:
$ journalctl -f -u run-r7125ca7a1551459492c19434aafdcc4a.service
-- Logs begin at Wed 2022-07-20 05:16:23 UTC. --
Jul 20 05:18:27 cml-3w7qrqs284 systemd[1]: Starting /usr/bin/bash -c curl https://amazon-ecr-credential-helper-releases.s3.us-east-2.amazonaws.com/0.5.0/linux-amd64/docker-credential-ecr-login --output /usr/bin/docker-credential-ecr-login && chmod 755 /usr/bin/docker-credential-ecr-login...
Jul 20 05:18:27 cml-3w7qrqs284 bash[11859]: % Total % Received % Xferd Average Speed Time Time Time Current
Jul 20 05:18:27 cml-3w7qrqs284 bash[11859]: Dload Upload Total Spent Left Speed
Jul 20 05:18:27 cml-3w7qrqs284 systemd[1]: Started /usr/bin/bash -c curl https://amazon-ecr-credential-helper-releases.s3.us-east-2.amazonaws.com/0.5.0/linux-amd64/docker-credential-ecr-login --output /usr/bin/docker-credential-ecr-login && chmod 755 /usr/bin/docker-credential-ecr-login.
Jul 20 05:18:28 cml-3w7qrqs284 bash[11859]: [237B blob data]
Jul 20 05:18:28 cml-3w7qrqs284 systemd[1]: run-r7125ca7a1551459492c19434aafdcc4a.service: Succeeded.
it does appear that there is a system mismatch:
AWS: 18.04
https://github.com/iterative/terraform-provider-iterative/blob/28ce78188618771e3338ef4373b295c6a8e85f2b/iterative/aws/provider.go#L86
Azure: 18.04
https://github.com/iterative/terraform-provider-iterative/blob/28ce78188618771e3338ef4373b295c6a8e85f2b/iterative/azure/provider.go#L45
GCP: 20.04
https://github.com/iterative/terraform-provider-iterative/blob/28ce78188618771e3338ef4373b295c6a8e85f2b/iterative/gcp/provider.go#L59
k8s: using our container based on 20.04
https://github.com/iterative/cml/blob/2acfde589f4b435c5b9adf3010ef773e71a060af/Dockerfile#L1
Ubuntu ends main updates for 18.04
in less than a year now if I am reading their chart correctly? Perhaps it's time for an update?
Perhaps it's time for an update?
Perhaps yes.
Regardless, the --same-dir
option is futile for this use case, and --service-type=exec
can be safely omitted so it falls back to simple
with a similar effect. Can we remove those options?
Agreed --same-dir
is not required.
Updated the example above to use credHelpers
instead of credsStore
I'll give this a test as well.
it seems {"credHelpers": {"ACCOUNT.dkr.ecr.REGION.amazonaws.com": "ecr-login"}}
was the missing part.
variables:
AWS_DEFAULT_REGION: "us-west-1"
AWS_REGISTRY: "342840881361.dkr.ecr.us-west-1.amazonaws.com"
stages:
- deploy
- train
deploy_job:
stage: deploy
when: always
image: iterativeai/cml
script:
- cml-runner
--cloud aws
--cloud-region us-west-1
--cloud-type t2.micro
--labels=cml-runner
train_job:
stage: train
when: on_success
needs: [deploy_job]
image: 342840881361.dkr.ecr.us-west-1.amazonaws.com/temp2:latest
tags:
- cml-runner
script:
- apt update && apt install -y awscli
- echo "hello"
worked without error (temp container is the latest cml img -> ghcr.io/iterative/cml:latest)
I'll note that it appears that GitLab's hosted runner cannot use private registries?
The following commands in the provisioning script fail due to some unrecognized options on systemd 237, used by our default images:
https://github.com/iterative/terraform-provider-iterative/blob/28ce78188618771e3338ef4373b295c6a8e85f2b/environment/setup.sh#L21-L23
The
--same-dir
option was introduced in systemd 251 (https://github.com/systemd/systemd/pull/10887) and theexec
type was also added later.