Closed wendigo closed 1 week ago
What's the motivation for this? Does "files will be deleted by the pruning process" mean they are no longer deleted as soon as the client doesn't need them anymore? If so, this can cause the spool to run out of space for certain workloads if the pruning process is not tuned carefully.
What's the motivation for this? Does "files will be deleted by the pruning process" mean they are no longer deleted as soon as the client doesn't need them anymore? If so, this can cause the spool to run out of space for certain workloads if the pruning process is not tuned carefully.
IIUC from offline conversation the motivation is to support deployment cases when explicit deletion of data is not supported. Could be due to technical implications (filesystem implementation etc); or due to process built around Trino, if we want to be sure that data returned by query is available for multiple third parties for prolonged time.
What's the motivation for this? Does "files will be deleted by the pruning process" mean they are no longer deleted as soon as the client doesn't need them anymore? If so, this can cause the spool to run out of space for certain workloads if the pruning process is not tuned carefully.
IIUC from offline conversation the motivation is to support deployment cases when explicit deletion of data is not supported. Could be due to technical implications (filesystem implementation etc); or due to process built around Trino, if we want to be sure that data returned by query is available for multiple third parties for prolonged time.
@wendigo maybe put motivation in commit message/ PR desc to make it more obvious for others/future self.
I don’t think we should be adding yet another configuration knob. I’d like to understand more why this is needed, what problem it’s solving, and think about whether there may be a better solution.
Is this related to using spooling for result set caching?
@electrum yes, exactly
@martint @electrum @losipiuk I've just checked that having a bucket retention policy makes basically deletes free, while explicit delete makes it incur an additional API call cost.
This allows for storage-wide expiration policies to be implemented that doesn't require explicit API call to remove an expired segment.
Doing this at the protocol level doesn't seem the right place. If the underlying spooling implementation doesn't support on-demand deletion, or the implementation wants to have another policy, this should be done within the spool implementation. It should be transparent to the protocol, which should always indicate when it's done using a segment.
This allows for storage-wide expiration policies to be implemented that doesn't require explicit API call to remove an expired segment.
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text: