Closed windiana42 closed 6 months ago
This is an alternative proposal for #166
@nicolasmueller @NMAC427 what do you think about the following configuration options for cache validation. In my opinion everything that one could wish for should be choosable this way:
cache_validation
: See [](#section-cache_validation). *Optional*
table_store
: See [](#section-table_store). *Required*
...
(section-cache_validation)=
### Cache Validation options
mode
: Choose a mode of cache invalidation.
Supported values:
- `normal`: Normal cache invalidation.
- `assert_no_fresh_input`: Same as `ignore_fresh_input` and additionally fail if tasks having
a cache function would still be executed (change in version or lazy query).
- `ignore_fresh_input`: Ignore the output of cache functions that help determine the availability of fresh input.
With `disable_cache_function=False`, it still calls cache functions, so cache invalidation works interchangeably
between between `ignore_fresh_input` and `normal`.
- `force_fresh_input`: Consider all cache function outputs as different and thus make source tasks cache invalid.
- `force_caches_invalid`: Disable caching and thus force all tasks as cache invalid.
This option implies `force_fresh_input`.
(default: `normal`)
disable_cache_function
: When set to `True`, cache functions are not called. This is not compatible with `mode=normal`. The difference to
`ignore_fresh_input` is that in case mode is set back to `normal`, the cache becomes invalid if disable_cache_function
was set to `True` during last run.
(default: `False`)
ignore_task_version
: When set to `True`, tasks that specify an explicit version for cache invalidation will always be considered cache invalid.
This might be useful for instances with short execution time during rapid development cycles when manually bumping version numbers becomes cumbersome.
(default: `False`)
This is what I propose for Flow.run():
def run(
self,
*components: Task | TaskGetItem | Stage,
config: ConfigContext = None,
orchestration_engine: OrchestrationEngine = None,
fail_fast: bool | None = None,
cache_validation_mode: CacheValidationMode | None = None,
disable_cache_function: bool | None = None,
ignore_task_version: bool | None = None,
**kwargs,
) -> Result:
everything that one could wish for should be choosable this way
Close but not completely true. I coupled the execution of the auto_version determination run also to the disable_cache_function flag. It behaves very similar in the sense that disable_cache_function=True executes less but prevents cache valid run on next invocation with mode=NORMAL.
Tests got massively slower by this PR. I think the combinatorial tests are still valuable. So I suggest to keep them for now. However, a future PR might simply deactivate them or move them to nightly tests only.
Checklist
docs/source/changelog.md
entry