Closed Puumanamana closed 1 year ago
Google LS does not support fusion. Regarding Batch, have you provided by private registry credentials via Tower?
Yes, the TOWER_ACCESS_TOKEN environment variable is set (and it works since it runs without fusion enabled)
Can you please include the container name as it is specified in the config?
I updated the post to include it
Are you using Google Artifact Registry or Container Registry?
Artifact registries
Can you please enter again the credentials on tower.nf ?
Just did (in my personal credentials, I deleted the container registry credentials for the private artifact registry and re-added it). Same error for now.
I think I've found the problem. We may release a patch by Monday
On Fri, Mar 17, 2023, 18:37 Cedric @.***> wrote:
Just did (in my personal credentials, I deleted the container registry credentials for the private artifact registry and re-added it). Same error for now.
— Reply to this email directly, view it on GitHub https://github.com/nextflow-io/nextflow/issues/3770#issuecomment-1474185614, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGHOSF2FK2CP3DAQWKPDT3W4SOOVANCNFSM6AAAAAAV6T32EE . You are receiving this because you commented.Message ID: @.***>
Great, thank you!
Unfortunately, I'm still unable to replicate the issue. Can you please try to rerun it at your convenience?
Still not working, I tried a few things:
I'll let you know if I find anything else.
Also, I don't know if it makes sense to do that, but I also tried using the local
executor with fusion enabled (and a GS URI as work directory), and I had the same error.
Here's the .nextflow.log for that if it helps:
Mar-17 20:58:11.382 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file /home/cedric/sandbox/troubleshoot/fusion/test.txt to work dir: gs://nf-tower-public/scratch/stage-bef7c30d-a2da-467b-9072-5f7d75582448/a1/7afc82dfaf1939258ad565a586d949/test.txt
Mar-17 20:58:11.662 [Actor Thread 3] DEBUG i.s.wave.plugin.config.WaveConfig - Wave strategy not specified - using default: [container, dockerfile, conda]
Mar-17 20:58:11.666 [Actor Thread 3] DEBUG io.seqera.wave.plugin.WaveClient - Wave server endpoint: https://wave.seqera.io
Mar-17 20:58:11.702 [Actor Thread 3] DEBUG io.seqera.wave.plugin.WaveClient - Wave request container config: https://fusionfs.seqera.io/releases/v2.1-amd64.json
Mar-17 20:58:11.892 [Actor Thread 3] DEBUG io.seqera.wave.plugin.WaveClient - Wave container config response: [200] {
"layers": [
{
"location": "https://fusionfs.seqera.io/releases/pkg/2/1/6/fusion-amd64.tar.gz",
"gzipDigest": "sha256:782f50229060010f4f8e8bb6c52822f3fc95dafef0ca742128998a307a1db0d3",
"gzipSize": 13522690,
"tarDigest": "sha256:382627a7a78ba495481489b036a99798e3c6245433c29685b0efdcc4b39740f1",
"skipHashing": true
}
]
}
Mar-17 20:58:11.939 [Actor Thread 3] DEBUG io.seqera.wave.plugin.WaveClient - Wave request: https://wave.seqera.io/container-token; attempt=1 - request: SubmitContainerTokenRequest(towerAccessToken:eyJ0aWQiOiA2OTgyfS45NTJmZjEwMjhmNjg3NTJkMWJjZmIxNTYyMDg4NmU2ZmQ3YTQ2Yjdl, towerRefreshToken:null, towerWorkspaceId:44413759927279, towerEndpoint:https://api.tower.nf, containerImage:us-docker.pkg.dev/rome-pipeline-engine/nxf-container-repo/l1em:master_fix-se, containerFile:null, containerConfig:ContainerConfig(entrypoint:null, cmd:null, env:null, workingDir:null, layers:[ContainerLayer[location=https://fusionfs.seqera.io/releases/pkg/2/1/6/fusion-amd64.tar.gz; tarDigest=sha256:382627a7a78ba495481489b036a99798e3c6245433c29685b0efdcc4b39740f1; gzipDigest=sha256:782f50229060010f4f8e8bb6c52822f3fc95dafef0ca742128998a307a1db0d3; gzipSize=13522690]]), condaFile:null, containerPlatform:null, buildRepository:null, cacheRepository:null, timestamp:2023-03-17T20:58:11.933374743Z, fingerprint:c6f62794090f039a74d43721d0a5ac6e)
Mar-17 20:58:12.519 [Actor Thread 3] DEBUG io.seqera.wave.plugin.WaveClient - Wave response: statusCode=200; body={"containerToken":"52d8b11047ee","targetImage":"wave.seqera.io/wt/52d8b11047ee/rome-pipeline-engine/nxf-container-repo/l1em:master_fix-se","expiration":"2023-03-18T18:58:12.464668808Z"}
Mar-17 20:58:12.976 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: docker run -i -e "FUSION_WORK=/fusion/gs/nf-tower-public/scratch/75/15b877bc26e72b9f97b182a2243050" -e "FUSION_TAGS=[.command.*|.exitcode|.fusion.*](nextflow.io/metadata=true),[*](nextflow.io/temporary=true)" --rm --privileged wave.seqera.io/wt/52d8b11047ee/rome-pipeline-engine/nxf-container-repo/l1em:master_fix-se /usr/bin/fusion bash '/fusion/gs/nf-tower-public/scratch/75/15b877bc26e72b9f97b182a2243050/.command.run'
Mar-17 20:58:12.978 [Task submitter] INFO nextflow.Session - [75/15b877] Submitted process > P1
Mar-17 20:58:18.228 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: P1; status: COMPLETED; exit: 125; error: -; workDir: gs://nf-tower-public/scratch/75/15b877bc26e72b9f97b182a2243050]
Mar-17 20:58:18.235 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=P1; work-dir=gs://nf-tower-public/scratch/75/15b877bc26e72b9f97b182a2243050
error [nextflow.exception.ProcessFailedException]: Process `P1` terminated with an error exit status (125)
Mar-17 20:58:18.299 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'null' -- Cause: java.nio.file.NoSuchFileException: gs://nf-tower-public/scratch/75/15b877bc26e72b9f97b182a2243050/.command.out
Mar-17 20:58:18.302 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'P1'
Caused by:
Process `P1` terminated with an error exit status (125)
Command executed:
echo finished > log1.out
Command exit status:
125
Command output:
(empty)
Command error:
Unable to find image 'wave.seqera.io/wt/52d8b11047ee/rome-pipeline-engine/nxf-container-repo/l1em:master_fix-se' locally
docker: Error response from daemon: received unexpected HTTP status: 500 Internal Server Error.
See 'docker run --help'.
I think we made some progress. You may want to give another try
Awesome, it works now!
Excellent! it was a problem with a URI redirect using a relative path made by Google AR
Bug report
I've encountered issues using the fusion filesystem on google-batch (and google-lifesciences) for private GCP artifact repositories. I understand the support is relatively recent (23.02.1-edge), but I'm still putting it out there in case it can help prevent bugs later. The fusion filesystem (along with wave containers) seems to work fine when using public container images, but fails on private ones. With the same config, switching off fusion (
fusion.enabled = false
) makes the run successful. In case it matters, settingscratch=true
orfalse
(as recommended in the docs) didn't affect the issue.After seeing error code 14 for GLS, I tried enabling the Service Control API without success.
Expected behavior and actual behavior
Expected: No error
Steps to reproduce the problem
Program output
google-batch:
google-lifesciences:
Also, if it helps, here's the google batch log:
Environment
$SHELL --version
): zsh 5.8 (x86_64-ubuntu-linux-gnu)