rabix / bunny

[Legacy] Executor for CWL workflows. Executes sbg:draft-2 and CWL 1.0
http://rabix.io
Apache License 2.0
74 stars 28 forks source link

Linking input files #367

Open milos-ljubinkovic opened 7 years ago

milos-ljubinkovic commented 7 years ago

A single configuration value could be added that changes the default behaviour when staging files and initial working directory from copy to link. This configuration can still be overridden with "sbg:stageInput" : "copy" in the case of input files and with "writable:true" in the case of InitialWorkDirRequirement.

mayacoda commented 7 years ago

@milos-ljubinkovic, could this configuration be set on the execution level? It could even have three levels of copy vs link. For example

  1. default would be to copy everything
  2. --link-staged flag would be to link everything unless "sbg:stageInput": "copy" is present.
  3. --link-staged-force flag would link everything regardless of "sbg:stageInput"
vladimirkovacevic commented 7 years ago

@milos-ljubinkovic , @mayacoda, from my experience copy is used very rarely, so I would rather set link as default.

buchanae commented 7 years ago

I wanted to note the edge cases we ran into while adding this to Funnel. Symlinks can sometimes result in errors like "too many levels of symlinks" and might not interact well with docker mounts as well. We went with hard links, but the edge case there is that linking across filesystem boundaries (e.g. local to NFS) doesn't work, so we fall back to copying.

So, linking is great, just wanted to share our experience, as a heads up.

milos-ljubinkovic commented 7 years ago

Going to add this to the next release similar to how maya defined it only configurable through core.properties file as well. Using hard links for now.

Also probably should add a warning if link-staged-force is used that it could be a destructive option if the workflow edits some of it inputs. For the same reason this option will be disabled by default.