Closed wosmanitx closed 6 months ago
Okay, the awkward truth is this "always" works because Quilt automatically downloads the entire package ahead of time.
Let me rename this to: "Download only path specified in input URI", as the goal is to NOT download more than necessary. I believe the desired result is "only download files which match this path prefix"
Compare:
Interesting. Folder URIs always have a trailing slash, but nothing has a leading slash.
We could be smart, and match exact if no trailing slash, but match prefix if it does.
I could pair this with #72 to create a package with only a single file, as a test.
Similarly, #path in an output package makes it easy to store the results of a run in its own subdirectory
I wonder if I was wrong, and this actually never works correctly. Need to test...
May-22 17:05:38.562 [main] DEBUG nextflow.quilt.nio.QuiltPath - Creating QuiltPath: interline-proteomics-analysis?Application=Enceladus&Author=Bianca&Comments=This+a+Package+imported+via+nextflow+quilt+plugin&Date=2023-03-16&Group=Bioinformatics&Program=SLC15A4#package=EDL%2fMSigDB_v7-5@cb598c1eb34c51559050a145efbedec696910caf074bd35cee180f33663d6946&path=raw%2fc2.cp.reactome.v7.5.symbols.gmt
May-22 17:05:38.564 [main] DEBUG nextflow.quilt.nio.QuiltFileSystem - QuiltFileSystem.getPath`[./]: []
May-22 17:05:38.564 [main] DEBUG nextflow.quilt.jep.QuiltParser - forURI[quilt+s3] for quilt+s3://./
May-22 17:05:38.565 [main] DEBUG nextflow.quilt.nio.QuiltPath - Creating QuiltPath: .
May-22 17:05:38.575 [main] DEBUG n.quilt.nio.QuiltFileSystemProvider - <A>BasicFileAttributes QuiltFileSystemProvider.readAttributes()
May-22 17:05:38.576 [main] DEBUG nextflow.quilt.nio.QuiltFileSystem - QuiltFileAttributes QuiltFileSystem.readAttributes(.)
May-22 17:05:38.576 [main] DEBUG nextflow.quilt.nio.QuiltPath - isAbsolute[null]
May-22 17:05:38.578 [main] DEBUG nextflow.Session - Session aborted -- Cause: Cannot invoke "nextflow.quilt.jep.QuiltPackage.packageDest()" because the return value of "nextflow.quilt.nio.QuiltPath.pkg()" is null
May-22 17:05:38.606 [main] ERROR nextflow.cli.Launcher - @unknown
java.lang.NullPointerException: Cannot invoke "nextflow.quilt.jep.QuiltPackage.packageDest()" because the return value of "nextflow.quilt.nio.QuiltPath.pkg()" is null
at nextflow.quilt.nio.QuiltPath.localPath(QuiltPath.groovy:75)
@wosmanitx Is this actually (still) a problem, or is it currently working for you?
resolved
Re-opening. Have a reproducible failure.
Sigh. Unit test success. Integration test fails. Is CHECK_INPUT doing something new? Tried running older version, but failed:
N E X T F L O W ~ version 23.04.3
ERROR ~ Unable to parse config file: '/Users/ernest/GitHub/nf-quilt/nextflow.config'
Compile failed for sources FixedSetSources[name='/groovy/script/Script775389F485D1318E2BBF21EE907E77EB/_nf_config_30789506']. Cause: BUG! exception in phase 'semantic analysis' in source unit '/groovy/script/Script775389F485D1318E2BBF21EE907E77EB/_nf_config_30789506' Unsupported class file major version 65
Jan-24 14:10:30.916 [Actor Thread 7] DEBUG nextflow.quilt.nio.QuiltFileSystem - No attributes yet for: /var/folders/tz/8q322ht10qzf9pswh01zv6880000gp/T/QuiltPackage11603948986612361183/QuiltPackage.quilt_example_examples_smart_report/README.md
Jan-24 14:10:30.918 [Actor Thread 7] DEBUG nextflow.util.CacheHelper - Unable to get file attributes file: quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md -- Cause: java.nio.file.NoSuchFileException: quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md
Jan-24 14:10:30.922 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file s3://quilt-example/examples/smart-report/README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-a752a1d5-2cf0-4fe4-8e45-537b2649578b/ba/4ea9cf52fa961a34bf0f9a2941ec06/README.md
Jan-24 14:10:30.922 [FileTransfer-2] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-a752a1d5-2cf0-4fe4-8e45-537b2649578b/75/c2862e9e5eafee01370edef3769628/quilt-example#package=examples%2fsmart-report&path=README.md
Jan-24 14:10:30.924 [FileTransfer-2] DEBUG nextflow.quilt.nio.QuiltFileSystem - No attributes yet for: /var/folders/tz/8q322ht10qzf9pswh01zv6880000gp/T/QuiltPackage11603948986612361183/QuiltPackage.quilt_example_examples_smart_report/README.md
Jan-24 14:10:30.929 [Actor Thread 7] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=CHECK_INPUT (3); work-dir=null
error [nextflow.exception.ProcessStageException]: Can't stage file quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md -- file does not exist
Ah! Maybe this is because I am not always auto-loading the package. I do that explicitly in the unit test, after all.
UPDATE: yes, that file is now downloaded before the "cp" -- but the "cp" still fails.
Okay, this is weird. Is the filename just escaped wrongly?
work_dir % ls -a
...
.command.sh
.exitcode
quilt-example#package=examples%2fhurdat2&path=README.md
work_dir % cat .command.sh
#!/bin/bash -ue
cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/
work_dir % sh .command.sh
cp: quilt-example#package=examples%2fhurdat2&path=README.md: No such file or directory
work_dir %
Or is something more subtle happening?
Okay, the structural issue is that Nextflow implicitly (and understandably) assumes that the part after the "/" is the filename. But we have a complex URI at the end, which is NOT the simplistic 'README.md' we expect
So we need to supplement (since we can't replace):
cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/
With
cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/README.md
Will this work in general? Heck if I know, but it is worth shot...
Nope. The problem is that the filename assumption is deeply hardcoded in NextFlow, and it copies those files all over the place. :-(
That implies we can try running the code and ask for "quilt-example#package=examples%2fhurdat2\&path=README.md" but boy is that ugly. Still should check if it works, though...
Doh. So, the real issue is simply that:
path 'quilt+s3://quilt-example#package=examples/hurdat2&path=README.md'
sets $input
to quilt-example#package=examples/hurdat2&path=README.md
which you can fix via:
if [ "$input" != "README.md" ]; then
cp -f $input README.md
fi
Of course it would be nice to avoid that, but I'm not sure how easy it is to munge path
. Will look...
Ah. This must be a "filename" method on QuiltPath that is doing something naive (and different than we did in Python). Let me see if I can isolate that...
Released 0.7.7 -- so make path-input
passes. At least for me:
Jan-29 15:05:36.133 [FileTransfer-2] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://quilt-example#package=examples%2fhurdat2&path=README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-f7373f44-164a-4c11-aaea-a6ac94dbdd44/0d/34908bc4a4b5ad963327d73c8f3625/README.md
Jan-29 15:05:36.133 [FileTransfer-2] INFO nextflow.quilt.jep.QuiltPackage - installing examples/hurdat2 from quilt-example...
But not for the customer. Odd.
Weird. It looks like it is installing, but it is not completing and/or returning an error. And anyway, the customer does not even start installing, that I can tell, so this may be a totally different issue...
Jan-29 15:45:50.736 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://nf-core-gallery#package=core%2fhic&path=README_NF_QUILT.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-f8e57909-5165-465d-a4ce-94253b04243d/a1/d272420394c9647df596cb984fcc3a/README_NF_QUILT.md
Jan-29 15:45:50.736 [FileTransfer-1] INFO nextflow.quilt.jep.QuiltPackage - installing core/hic from nf-core-gallery...
Jan-29 15:45:50.824 [FileTransfer-2] DEBUG n.cloud.aws.nio.S3FileSystemProvider - S3 download file from=s3://nf-core-gallery/nf-core/hic/README_NF_QUILT.md to=/Users/ernest/GitHub/nf-quilt/work/stage-f8e57909-5165-465d-a4ce-94253b04243d/79/4b7b3d2efb6e7985ac38d01c1014d6/README_NF_QUILT.md
Jan-29 15:45:50.824 [FileTransfer-2] DEBUG nextflow.cloud.aws.nio.S3Client - Creating S3 transfer manager pool - chunk-size=104857600; max-treads=10;
Jan-29 15:45:51.173 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-29 15:45:51.174 [Task submitter] INFO nextflow.Session - [7e/3ddda6] Submitted process > CHECK_INPUT (1)
Jan-29 15:45:51.212 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: CHECK_INPUT (1); status: COMPLETED; exit: 0; error: -; workDir: /Users/ernest/GitHub/nf-quilt/work/7e/3ddda636fab9cf700d189a697d4fc3]
Jan-29 15:45:51.610 [FileTransfer-1] ERROR nextflow.quilt.jep.QuiltPackage - failed to install core/hic
Jan-29 15:45:51.616 [Actor Thread 5] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=CHECK_INPUT (2); work-dir=null
error [nextflow.exception.ProcessStageException]: Can't stage file quilt+s3://nf-core-gallery#package=core%2fhic&path=README_NF_QUILT.md -- file does not exist
Jan-29 15:45:51.626 [Actor Thread 5] ERROR nextflow.processor.TaskProcessor - Error executing process > 'CHECK_INPUT (2)'
Current Hypothesis: TransferAware is a new feature, not supported in 23.10, so nf-quilt does not auto-install the package. Will force install in 0.7.9
NOTE: seems to install in Tower, but errors out with (hopefully irrelevant):
Jan-30 04:24:05.336 [Actor Thread 3] DEBUG i.s.wave.plugin.config.WaveConfig - Wave strategy not specified - using default: [container, dockerfile, conda, spack]
Works!
such as:
quilt+s3://interline-quiltdemo#package=WDR5/EXP22000894@ed6ebf851478cf665ed435e0d718e78c9d519fd461717f0669c9538527f095f7&path=cf_out%2FO43353-432-523__O43353-432-523_relaxed_rank_1_model_2.pdb