From the 4.0.0 release deployer, using PrivateNetworking=true in a hub and spoke model, with the vnets/subnets/ACR/storage configured per private-coa.md; workflows using public container images run successfully.
When using an image from a private ACR (tested permissions, az aks check-acr, pulling using MSI on a debug pod launched w/ same AADPodIdentity), TES reports the following:
Task 75fe9035_de878743a063451ca80bec15efadb1a3 failed. ExitCode: , BatchJobInfo: {"MoreThanOneActiveJobOrTaskFound":false,"ActiveJobWithMissingAutoPool":false,"AttemptNumber":1,"NodeAllocationFailed":false,"NodeErrorCode":null,"NodeErrorDetails":null,"JobState":0,"NodeState":0,"TaskState":3,"TaskExitCode":null,"TaskExecutionResult":1,"TaskStartTime":"2023-03-23T00:11:14.226296Z","TaskEndTime":"2023-03-23T00:11:14.701518Z","TaskFailureInformation":{"Category":0,"Code":"ContainerInvalidSettings","Details":[{"Name":"ContainerSettings","Value":"--rm -v /var/run/docker.sock:/var/run/docker.sock -v $AZ_BATCH_NODE_ROOT_DIR:$AZ_BATCH_NODE_ROOT_DIR "},{"Name":"Message","Value":"Duplicate mount point: /mnt/batch/tasks"}],"Message":"At least one value of specified task container settings is invalid"},"TaskContainerState":null,"TaskContainerError":null,"Pool":{"AutoPoolSpecification":null,"PoolId":"TES-FCBVLNGS-F2s_v2-U2WWYO7JPSYF4B5CWHOIPBPBQ5R5IWD3-H3SJ4XXN"}}
Steps to Reproduce
Follow private-coa.md, copy the MCR ubuntu 22.04 image to the private ACR and launch workflow using that ACR as the workflow's tasks docker setting.
Expected behavior
The container should run the same as when it is not private. I'm not sure why a duplicate mount is injected.
Deployment details: (any information you can provide would be helpful):
OS: Ubuntu
Version 22.04
Private Vnet, Hub and Spoke, all related private endpoints pushed into cromwell40pe subnet w/o endpoint policies, Network Contrib for spoke networking group, AcrPull/Network Contrib for ACR, Contrib on Batch
It should be noted that the AKS cluster can launch pods using the same private ACR, and a pod with same AADPodBinding as TES can pull images without issue. Using other batch scheduling NGS tooling (nextflow), the same infrastructure runs those workflows without issue.
From the 4.0.0 release deployer, using PrivateNetworking=true in a hub and spoke model, with the vnets/subnets/ACR/storage configured per private-coa.md; workflows using public container images run successfully.
When using an image from a private ACR (tested permissions, az aks check-acr, pulling using MSI on a debug pod launched w/ same AADPodIdentity), TES reports the following:
Steps to Reproduce Follow private-coa.md, copy the MCR ubuntu 22.04 image to the private ACR and launch workflow using that ACR as the workflow's tasks docker setting.
Expected behavior The container should run the same as when it is not private. I'm not sure why a duplicate mount is injected.
Deployment details: (any information you can provide would be helpful):
It should be noted that the AKS cluster can launch pods using the same private ACR, and a pod with same AADPodBinding as TES can pull images without issue. Using other batch scheduling NGS tooling (nextflow), the same infrastructure runs those workflows without issue.
Thanks for your help and attention!