Closed jonnyry closed 1 month ago
You can recreate this bug without having to deploy a fresh TRE by:
cd /root/.porter.cache
then rm -rf *
docker logout <TRE_ID>acr.azurecr.io
which will remove the credentials docker holds to connect to the container register (resetting it to as it was when it was first deployed).I can remember looking at this at the time. Weird you have seen it as was a while ago and don't think I've come across the issue and our E2E PR tests would fail. So I'm confused why seeing this now, and not in the tests.
Looking at the code needs running once on RP start-up and is done here - https://github.com/microsoft/AzureTRE/blob/ddbbffe70fc6a8fe5d0b430afc4c18116f7ff993/core/terraform/resource_processor/vmss_porter/cloud-config.yaml#L91
Looking at your logs I think your actual error is Error message: parameter "tre_id" is required
. Is this a custom bundle, if so think you are missing passing tre_id
somewhere.
I've see this too recently
@jonnyry I think I've seen this before when deploying a workspace, as as you say subsequent deploys work, thats been our "workaround".
@jonnyry I think I've seen this before when deploying a workspace, as as you say subsequent deploys work, thats been our "workaround".
yes - also our workaround :-) just thought i'd get it logged as seen it several times now
Looking at the code needs running once on RP start-up and is done here -
I notice the az acr login
is run on the VM itself rather than inside the resource processor docker container - is the az "session" shared inside the docker container?
- az acr login --name ${docker_registry_server}
- docker run -d -p 8080:8080 -v /var/run/docker.sock:/var/run/docker.sock
--restart always --env-file .env
--name resource_processor1
--log-driver local
${docker_registry_server}/${resource_processor_vmss_porter_image_repository}:${resource_processor_vmss_porter_image_tag}
Looking at your logs I think your actual error is
Error message: parameter "tre_id" is required
. Is this a custom bundle, if so think you are missing passingtre_id
somewhere.
The logs in the issue description are for a custom bundle, however it also happens for standard bundles, this is from a test I ran just now after resetting the cache & docker credentials inside the resource processor container -
1) Main step for 28b8b4b2-8840-4eac-89d9-ab6294ac1aa2
28b8b4b2-8840-4eac-89d9-ab6294ac1aa2: Error message: parameter "address_spaces" is required ; Command executed: porter install "28b8b4b2-8840-4eac-89d9-ab6294ac1aa2" --reference XXXXX.azurecr.io/tre-workspace-airlock-import-review:v0.12.16 --force --credential-set arm_auth --credential-set aad_auth
It looks like az login & az acr login are called when running a constructed porter command (install etc):
But not when calling porter explain, prior to the above code running:
Docker is passed through the container so the creds should pass through. I'm sure I tested it.
But, yes, fix if it is an issue is to add the login commands to to the explain command.
OK just checking the creds on the VM and inside the resource processor container... the two are not the same, at least on my instance :-D
I deployed a new TRE from current main, and the first workspace I attempted to create fails (I've noticed this on fresh deploys a couple of times now) - the resource processor fails to deploy the workspace with the following message:
Creation of subsequent workspaces all succeed.
Here's the logs for the failed run:
I'm pretty sure the
get_porter_parameter_keys
function is failing the first time around, and specifically on theporter explain
line:https://github.com/microsoft/AzureTRE/blob/ddbbffe70fc6a8fe5d0b430afc4c18116f7ff993/resource_processor/resources/commands.py#L107
It looks like
az acr login
has not been called, and hence its causing the registry server to deny the request. However porter install is still called despite not building the parameters, whereaz acr login
is called - which is why subsequent runs work.Looking at the commit history I can see the
az login
/az acr login
were previously called before runningporter explain
:Commit:
https://github.com/microsoft/AzureTRE/commit/c382f3daa041f337455ec47fef24eedad5ce55e6
Wondering if the
az login
&az acr login
should have remained before callingporter explain
?