In the current version, data instance has data types with possible values: blob, directory. After some discussions it seems that it makes sense to put data types also into data objects, i.e. task graph already contains information about data type.
This is a proposal how to handle this in Python API. This is relevant for Python tasks and tasks.execute + Program class.
Now, the user indicates that output is directory by setting content_type to 'dir', e.g.:
Output("mydata", content_type="dir")
Here we propose to introduce OutputDir class indicate directory output, and reserve `Output for blob data type. Both can be used in the remote python tasks, tasks.execute, and Program.
To make it symetric, we can also introduce Input/InputDir for blob/directory inputs. Strictly speaking it is not necessary as data type may be obtain from provided object (and in case Program, decision of data type may be postponed). However, the ideas is to provide additional level of "type" check:
Input("mydata", dataobj=d) # Fail if 'd' is data object of directory type
InputDir("mydata", dataobj=d) # Fail if 'd' is data object of blob type
In the case of implicit input, right data type is derived from provided data object:
tasks.execute(["du", "-h", d]) # This will work for 'd' being directory or blob
Python API for data type in data objects
In the current version, data instance has data types with possible values: blob, directory. After some discussions it seems that it makes sense to put data types also into data objects, i.e. task graph already contains information about data type.
This is a proposal how to handle this in Python API. This is relevant for Python tasks and tasks.execute + Program class.
Now, the user indicates that output is directory by setting content_type to 'dir', e.g.:
Here we propose to introduce
OutputDir
class indicate directory output, and reserve`Output
for blob data type. Both can be used in the remote python tasks, tasks.execute, and Program.To make it symetric, we can also introduce
Input
/InputDir
for blob/directory inputs. Strictly speaking it is not necessary as data type may be obtain from provided object (and in case Program, decision of data type may be postponed). However, the ideas is to provide additional level of "type" check:In the case of implicit input, right data type is derived from provided data object:
Alternatives
Input
andOutput
: