quiltdata / nf-quilt

GNU General Public License v3.0
7 stars 1 forks source link

overlay plugin #223

Closed drernie closed 1 week ago

drernie commented 3 weeks ago

Instead of intercepting events, can we package after the fact the way nf-prov does? Will this allow us to work with any datastore, by leveraging the built-in plugins?

drernie commented 3 weeks ago

overlay ticket

drernie commented 3 weeks ago

Hmm. Quilt-rs local packages ignore versionId (obviously). For this version, we may need to just push to S3 like we normally do. I suspect nf-* filesystems don't support VersionID.

Yeah, we could create unversioned packages as a starting hack, but it is pretty lame...

drernie commented 3 weeks ago

Actually the brute force approach is to RE-push the files using QuiltCore-Java. Inefficient but safe. Let's start there, if we can.

drernie commented 3 weeks ago

The only tricky part should be inferring the bucket and package name. Do we need to do the same trick of parsing the Params?

drernie commented 3 weeks ago

Ah. First failure mode: "Cleverly" defaults to the input URI, and pushes the output package there

https://nightly.quilttest.com/b/quilt-example/packages/examples/hurdat/tree/aa8ffd82eb1edabc92c4786b1a6c23a0520b33e81e4af7d28ba338efc3458083/

More worryingly, does NOT seem to be writing to the S3 URI: s3://quilt-example/test/overlay

Screenshot 2024-08-21 at 13 54 21
drernie commented 3 weeks ago

Okay, s3-only works (without any nf-input).
./launch.sh run ./main.nf -profile standard --input "s3://quilt-example/test/hurdat" --outdir "s3://quilt-example/test/overlay"

Oddly, it still shows a warning despite no explicit plugin: WARN: onFilePublish.not.QuiltPath: /quilt-example/test/overlay/output/hurdat

drernie commented 3 weeks ago

Can we get it to publish by synthesizing fake Quilt paths from published files?

Aug-21 14:04:44.920 [PublishDir-1] DEBUG nextflow.processor.PublishDir - Failed to publish file: /Users/ernest/GitHub/nf-quilt/work/87/69ae8c338f953a50914e4bc3b0105f/output/hurdat; to: s3://quilt-example/test/overlay/output/hurdat [copy] -- attempt: 1; reason: the path: s3://quilt-example/test/overlay/output/hurdat/nf-quilt does not exist
Aug-21 14:04:45.308 [main] INFO  nextflow.util.ThreadPoolHelper - Waiting for file transfers to complete (1 files)
Aug-21 14:04:50.356 [PublishDir-1] DEBUG nextflow.processor.PublishDir - Failed to publish file: /Users/ernest/GitHub/nf-quilt/work/87/69ae8c338f953a50914e4bc3b0105f/output/hurdat; to: s3://quilt-example/test/overlay/output/hurdat [copy] -- attempt: 2; reason: the path: s3://quilt-example/test/overlay/output/hurdat/output/quilt-example#package=examples%2fhurdat/nf-quilt does not exist
Aug-21 14:04:50.490 [PublishDir-1] DEBUG n.cloud.aws.nio.S3FileSystemProvider - S3 upload directory from=/Users/ernest/GitHub/nf-quilt/work/87/69ae8c338f953a50914e4bc3b0105f/output/hurdat to=s3://quilt-example/test/overlay/output/hurdat
Aug-21 14:04:51.187 [PublishDir-1] DEBUG nextflow.quilt.QuiltObserver - onFilePublish.Path[/quilt-example/test/overlay/output/hurdat]
Aug-21 14:04:51.187 [PublishDir-1] WARN  nextflow.quilt.QuiltObserver - onFilePublish.not.QuiltPath: /quilt-example/test/overlay/output/hurdat
Aug-21 14:04:51.188 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'PublishDir' shutdown completed (hard=false)
Aug-21 14:04:51.193 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=1ms; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
Aug-21 14:04:51.302 [main] DEBUG nextflow.quilt.QuiltObserver - onFlowComplete.workflowOutputs[1]: [/Users/ernest/GitHub/nf-quilt/work/87/69ae8c338f953a50914e4bc3b0105f/output/hurdat:/quilt-example/test/overlay/output/hurdat]
Aug-21 14:04:51.302 [main] DEBUG nextflow.quilt.QuiltObserver - onFlowComplete.publishedURIs[0]: [:]

YES! Well, publish a package. Not the actual contents... ... https://nightly.quilttest.com/b/quilt-example/packages/test/overlay/tree/6b63263a0f3c723b3d470e4c45677aa00b7c1433aba19e037f9b31392e7c8f98/

drernie commented 3 weeks ago

Actually, the original plugin seems to do the same thing 🤦 #227 https://nightly.quilttest.com/b/quilt-example/packages/test/hurdat/tree/da52035abd458c99d553de167c79735b580d9038737a066d4f4358796407eadc/

drernie commented 3 weeks ago

Huh, conditional writes may mitigate the 'brute force' solution.

drernie commented 2 weeks ago

Might be due to #231 Nope, pure S3 inputs give the same error:

Aug-28 11:02:01.834 [main] DEBUG nextflow.quilt.QuiltObserver - onFlowComplete.workflowOutputs[3]: [
/Users/ernest/GitHub/nf-quilt/work/1d/1247e5757ba50f1e4071f4667a5131/inputs/COPY_THIS.md:/udp-spec/nf-quilt/s3-overlay/inputs/COPY_THIS.md,
/Users/ernest/GitHub/nf-quilt/work/1d/1247e5757ba50f1e4071f4667a5131/inputs/a_folder/THING_TWO.md:/udp-spec/nf-quilt/s3-overlay/inputs/a_folder/THING_TWO.md, 
/Users/ernest/GitHub/nf-quilt/work/1d/1247e5757ba50f1e4071f4667a5131/inputs/a_folder/THING_ONE.md:/udp-spec/nf-quilt/s3-overlay/inputs/a_folder/THING_ONE.md
]
Aug-28 11:02:01.834 [main] DEBUG nextflow.quilt.QuiltObserver - onFlowComplete.publishedURIs[1]: [udp-spec/nf-quilt/s3-overlay:quilt+s3://udp-spec#package=nf-quilt%2fs3-overlay&path=inputs/a_folder/THING_ONE.md]
drernie commented 2 weeks ago

The Quilt Package is properly inferred from the path. And we have the list of files. Can I just copy those into the working directory, alongside the auto-generated files?

drernie commented 2 weeks ago

Almost! Copies individual files, not the correct path.

drernie commented 2 weeks ago

Solution: pass the '&path=' as the key in a dictionary over "overlay" files.