Open anna-geller opened 1 week ago
I have a lot of similar cases. It would be super helpful to have this feature implemented
Update: This will be complex to implement (not a quick win) since this new task will not be able to remove the outputs of other tasks unless we change the executor implementation to make it work.
Additionally, for each output to remove, we'll need to:
Feature description
Context
There are many use cases in which user may want to pass output to downstream tasks without the need to keep it afterward due to:
For example, imagine that you extract a large dataset from a given source, and then you load it to a destination such as BigQuery. Once data has been successfully stored in BigQuery, there's no need to keep it in internal storage. We currently support a Purge task, but it will delete all outputs — you can't cherry-pick only specific ones. However, sometimes you want to keep all outputs but only purge a single large/sensitive output.
It would be great to add a task that allows to delete only specific outputs by ID.
Proposed syntax
Example usage in a flow