treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.31k stars 222 forks source link

[New feature] Digdag users want to share files between tasks on server mode. #785

Open hiroyuki-sato opened 6 years ago

hiroyuki-sato commented 6 years ago

Digdag users want to share files between tasks on server mode.

Use cases

Current server mode, another task can't access download file. (See also #735).

I think upload_s3 option may solve those use cases. It uploads pg>, td> redshift> results to s3 instead of writing locally.

For examle,

+step1:
  pg>: XXX.sql
  upload_s3: my-bucket/file.csv
David-Development commented 6 years ago

How can the user share other files between tasks? (such as text/binary files)

Use Case: (Machine Learning Application):

Is it possible to implement such a use case in DigDag? Is there a way to define which files are "output" and which are only temporary files that don't belong to the output?

hiroyuki-sato commented 6 years ago

I think(a digdag user) sharing data between tasks is an expectation feature in future release.

Where is the input/output data store? What operator do you use?

My upload_s3 idea is sharing task data using S3 between tasks. Does your scenario require local storage?

Those slides may help.

machine-learning example.

Those examples use Treasure Data data store. Because, Digdag and Hivemall maintainted by ARM treasure data.

Another case, some user use EFS(Amazon Elastic File system.) with sh operator for avoiding isolating working area.

skokado commented 4 years ago

I agree too. My idea is it will be workspace can be selected to be generated per session. Current, only per task.

I did workaround to make a shell script that doing multi tasks.