Closed dakoner closed 2 years ago
Thanks @dakoner for the question.
Did you take a look at Dir(remote_path).stage(local_path)
. Does that achieve what you want?
You should be able to use it like this:
@task()
def run_prog(input: File, out_s3_path: str) -> Dir:
return script(
f"""
prog local_file --output local_dir
""",
inputs=[input.stage("local_file")],
outputs=Dir(out_s3_path).stage("local_dir"),
)
Thanks, that solved the problem! I only saw Dir() being used directly in a task (https://github.com/insitro/redun/blob/main/examples/06_bioinfo_batch/workflow.py#L580) I tried Dir() as an outputs=[] and it worked perfectly.
I have a batch job that runs a script which writes a directory tree to the local filesystem, and I want to stage the output tree using the outputs=[] option to script(). I don't know the directory tree contents (it could change based on arguments to the script). I basically want to do the equivalent of cp -r but it looks like outputs are staged using cp.
Is there a way to defer computing the outputs() list to after the script has run? In that case I could run something like glob.glob(output_dir + "/**", recursive=True) to get a full list of output files and have those mirrored (respecting the path structure under the directory tree).
Otherwise, I'd ending up putting this at the end of the script: aws cp --recursive output_dir s3://my-output-bucket/final-data/