Open hiroyuki-sato opened 6 years ago
How can the user share other files between tasks? (such as text/binary files)
Use Case: (Machine Learning Application):
Is it possible to implement such a use case in DigDag? Is there a way to define which files are "output" and which are only temporary files that don't belong to the output?
I think(a digdag user) sharing data between tasks is an expectation feature in future release.
Where is the input/output data store? What operator do you use?
My upload_s3
idea is sharing task data using S3 between tasks. Does your scenario require local storage?
Those slides may help.
machine-learning example.
Those examples use Treasure Data data store. Because, Digdag and Hivemall maintainted by ARM treasure data.
Another case, some user use EFS(Amazon Elastic File system.) with sh
operator for avoiding isolating working area.
I agree too. My idea is it will be workspace can be selected to be generated per session. Current, only per task.
I did workaround to make a shell script that doing multi tasks.
Digdag users want to share files between tasks on server mode.
Use cases
Download file from a database. and execute embulk withsh>
operator. Twitter (Japanese)sh>: command && digdag run wf01.dig && digdag run wf02.dig
for workaround twitterCurrent server mode, another task can't access download file. (See also #735).
I think
upload_s3
option may solve those use cases. It uploadspg>
,td>
redshift>
results to s3 instead of writing locally.For examle,