quiltdata / nf-quilt

GNU General Public License v3.0
7 stars 1 forks source link

Support path in input URI #71

Closed wosmanitx closed 6 months ago

wosmanitx commented 1 year ago

such as:

quilt+s3://interline-quiltdemo#package=WDR5/EXP22000894@ed6ebf851478cf665ed435e0d718e78c9d519fd461717f0669c9538527f095f7&path=cf_out%2FO43353-432-523__O43353-432-523_relaxed_rank_1_model_2.pdb

drernie commented 1 year ago

Okay, the awkward truth is this "always" works because Quilt automatically downloads the entire package ahead of time.

Let me rename this to: "Download only path specified in input URI", as the goal is to NOT download more than necessary. I believe the desired result is "only download files which match this path prefix"

drernie commented 1 year ago

Compare:

Interesting. Folder URIs always have a trailing slash, but nothing has a leading slash.

We could be smart, and match exact if no trailing slash, but match prefix if it does.

drernie commented 1 year ago

I could pair this with #72 to create a package with only a single file, as a test.

drernie commented 1 year ago

Similarly, #path in an output package makes it easy to store the results of a run in its own subdirectory

drernie commented 1 year ago

I wonder if I was wrong, and this actually never works correctly. Need to test...

drernie commented 1 year ago
May-22 17:05:38.562 [main] DEBUG nextflow.quilt.nio.QuiltPath - Creating QuiltPath: interline-proteomics-analysis?Application=Enceladus&Author=Bianca&Comments=This+a+Package+imported+via+nextflow+quilt+plugin&Date=2023-03-16&Group=Bioinformatics&Program=SLC15A4#package=EDL%2fMSigDB_v7-5@cb598c1eb34c51559050a145efbedec696910caf074bd35cee180f33663d6946&path=raw%2fc2.cp.reactome.v7.5.symbols.gmt
May-22 17:05:38.564 [main] DEBUG nextflow.quilt.nio.QuiltFileSystem - QuiltFileSystem.getPath`[./]: []
May-22 17:05:38.564 [main] DEBUG nextflow.quilt.jep.QuiltParser - forURI[quilt+s3] for quilt+s3://./
May-22 17:05:38.565 [main] DEBUG nextflow.quilt.nio.QuiltPath - Creating QuiltPath: .
May-22 17:05:38.575 [main] DEBUG n.quilt.nio.QuiltFileSystemProvider - <A>BasicFileAttributes QuiltFileSystemProvider.readAttributes()
May-22 17:05:38.576 [main] DEBUG nextflow.quilt.nio.QuiltFileSystem - QuiltFileAttributes QuiltFileSystem.readAttributes(.)
May-22 17:05:38.576 [main] DEBUG nextflow.quilt.nio.QuiltPath - isAbsolute[null]
May-22 17:05:38.578 [main] DEBUG nextflow.Session - Session aborted -- Cause: Cannot invoke "nextflow.quilt.jep.QuiltPackage.packageDest()" because the return value of "nextflow.quilt.nio.QuiltPath.pkg()" is null
May-22 17:05:38.606 [main] ERROR nextflow.cli.Launcher - @unknown
java.lang.NullPointerException: Cannot invoke "nextflow.quilt.jep.QuiltPackage.packageDest()" because the return value of "nextflow.quilt.nio.QuiltPath.pkg()" is null
    at nextflow.quilt.nio.QuiltPath.localPath(QuiltPath.groovy:75)
drernie commented 1 year ago

@wosmanitx Is this actually (still) a problem, or is it currently working for you?

wosmanitx commented 1 year ago

resolved

drernie commented 6 months ago

Re-opening. Have a reproducible failure.

drernie commented 6 months ago

Sigh. Unit test success. Integration test fails. Is CHECK_INPUT doing something new? Tried running older version, but failed:

N E X T F L O W  ~  version 23.04.3
ERROR ~ Unable to parse config file: '/Users/ernest/GitHub/nf-quilt/nextflow.config'

  Compile failed for sources FixedSetSources[name='/groovy/script/Script775389F485D1318E2BBF21EE907E77EB/_nf_config_30789506']. Cause: BUG! exception in phase 'semantic analysis' in source unit '/groovy/script/Script775389F485D1318E2BBF21EE907E77EB/_nf_config_30789506' Unsupported class file major version 65
drernie commented 6 months ago
Jan-24 14:10:30.916 [Actor Thread 7] DEBUG nextflow.quilt.nio.QuiltFileSystem - No attributes yet for: /var/folders/tz/8q322ht10qzf9pswh01zv6880000gp/T/QuiltPackage11603948986612361183/QuiltPackage.quilt_example_examples_smart_report/README.md
Jan-24 14:10:30.918 [Actor Thread 7] DEBUG nextflow.util.CacheHelper - Unable to get file attributes file: quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md -- Cause: java.nio.file.NoSuchFileException: quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md
Jan-24 14:10:30.922 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file s3://quilt-example/examples/smart-report/README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-a752a1d5-2cf0-4fe4-8e45-537b2649578b/ba/4ea9cf52fa961a34bf0f9a2941ec06/README.md
Jan-24 14:10:30.922 [FileTransfer-2] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-a752a1d5-2cf0-4fe4-8e45-537b2649578b/75/c2862e9e5eafee01370edef3769628/quilt-example#package=examples%2fsmart-report&path=README.md
Jan-24 14:10:30.924 [FileTransfer-2] DEBUG nextflow.quilt.nio.QuiltFileSystem - No attributes yet for: /var/folders/tz/8q322ht10qzf9pswh01zv6880000gp/T/QuiltPackage11603948986612361183/QuiltPackage.quilt_example_examples_smart_report/README.md
Jan-24 14:10:30.929 [Actor Thread 7] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=CHECK_INPUT (3); work-dir=null
  error [nextflow.exception.ProcessStageException]: Can't stage file quilt+s3://quilt-example#package=examples%2fsmart-report&path=README.md -- file does not exist
drernie commented 6 months ago

Ah! Maybe this is because I am not always auto-loading the package. I do that explicitly in the unit test, after all.

UPDATE: yes, that file is now downloaded before the "cp" -- but the "cp" still fails.

drernie commented 6 months ago

Okay, this is weird. Is the filename just escaped wrongly?

work_dir % ls -a
...
.command.sh
.exitcode
quilt-example#package=examples%2fhurdat2&path=README.md
work_dir % cat .command.sh
#!/bin/bash -ue
cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/
work_dir % sh .command.sh
cp: quilt-example#package=examples%2fhurdat2&path=README.md: No such file or directory
work_dir % 

Or is something more subtle happening?

drernie commented 6 months ago

Okay, the structural issue is that Nextflow implicitly (and understandably) assumes that the part after the "/" is the filename. But we have a complex URI at the end, which is NOT the simplistic 'README.md' we expect

So we need to supplement (since we can't replace):

cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/

With

cp quilt-example#package=examples%2fhurdat2\&path=README.md ../../tmp/README.md

Will this work in general? Heck if I know, but it is worth shot...

drernie commented 6 months ago

Nope. The problem is that the filename assumption is deeply hardcoded in NextFlow, and it copies those files all over the place. :-(

That implies we can try running the code and ask for "quilt-example#package=examples%2fhurdat2\&path=README.md" but boy is that ugly. Still should check if it works, though...

drernie commented 6 months ago

Doh. So, the real issue is simply that: path 'quilt+s3://quilt-example#package=examples/hurdat2&path=README.md' sets $input to quilt-example#package=examples/hurdat2&path=README.md which you can fix via:

    if [ "$input" != "README.md" ]; then
        cp -f $input README.md
    fi

Of course it would be nice to avoid that, but I'm not sure how easy it is to munge path. Will look...

drernie commented 6 months ago

Ah. This must be a "filename" method on QuiltPath that is doing something naive (and different than we did in Python). Let me see if I can isolate that...

drernie commented 6 months ago

Released 0.7.7 -- so make path-input passes. At least for me:

Jan-29 15:05:36.133 [FileTransfer-2] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://quilt-example#package=examples%2fhurdat2&path=README.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-f7373f44-164a-4c11-aaea-a6ac94dbdd44/0d/34908bc4a4b5ad963327d73c8f3625/README.md
Jan-29 15:05:36.133 [FileTransfer-2] INFO  nextflow.quilt.jep.QuiltPackage - installing examples/hurdat2 from quilt-example...

But not for the customer. Odd.

drernie commented 6 months ago

Weird. It looks like it is installing, but it is not completing and/or returning an error. And anyway, the customer does not even start installing, that I can tell, so this may be a totally different issue...

Jan-29 15:45:50.736 [FileTransfer-1] DEBUG nextflow.file.FilePorter - Copying foreign file quilt+s3://nf-core-gallery#package=core%2fhic&path=README_NF_QUILT.md to work dir: /Users/ernest/GitHub/nf-quilt/work/stage-f8e57909-5165-465d-a4ce-94253b04243d/a1/d272420394c9647df596cb984fcc3a/README_NF_QUILT.md
Jan-29 15:45:50.736 [FileTransfer-1] INFO  nextflow.quilt.jep.QuiltPackage - installing core/hic from nf-core-gallery...
Jan-29 15:45:50.824 [FileTransfer-2] DEBUG n.cloud.aws.nio.S3FileSystemProvider - S3 download file from=s3://nf-core-gallery/nf-core/hic/README_NF_QUILT.md to=/Users/ernest/GitHub/nf-quilt/work/stage-f8e57909-5165-465d-a4ce-94253b04243d/79/4b7b3d2efb6e7985ac38d01c1014d6/README_NF_QUILT.md
Jan-29 15:45:50.824 [FileTransfer-2] DEBUG nextflow.cloud.aws.nio.S3Client - Creating S3 transfer manager pool - chunk-size=104857600; max-treads=10;
Jan-29 15:45:51.173 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jan-29 15:45:51.174 [Task submitter] INFO  nextflow.Session - [7e/3ddda6] Submitted process > CHECK_INPUT (1)
Jan-29 15:45:51.212 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: CHECK_INPUT (1); status: COMPLETED; exit: 0; error: -; workDir: /Users/ernest/GitHub/nf-quilt/work/7e/3ddda636fab9cf700d189a697d4fc3]
Jan-29 15:45:51.610 [FileTransfer-1] ERROR nextflow.quilt.jep.QuiltPackage - failed to install core/hic
Jan-29 15:45:51.616 [Actor Thread 5] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=CHECK_INPUT (2); work-dir=null
  error [nextflow.exception.ProcessStageException]: Can't stage file quilt+s3://nf-core-gallery#package=core%2fhic&path=README_NF_QUILT.md -- file does not exist
Jan-29 15:45:51.626 [Actor Thread 5] ERROR nextflow.processor.TaskProcessor - Error executing process > 'CHECK_INPUT (2)'
drernie commented 6 months ago

Current Hypothesis: TransferAware is a new feature, not supported in 23.10, so nf-quilt does not auto-install the package. Will force install in 0.7.9

drernie commented 6 months ago

NOTE: seems to install in Tower, but errors out with (hopefully irrelevant):

Jan-30 04:24:05.336 [Actor Thread 3] DEBUG i.s.wave.plugin.config.WaveConfig - Wave strategy not specified - using default: [container, dockerfile, conda, spack]
drernie commented 6 months ago

Works!