pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Making `ignore_fresh_input` more explicit (or the docs) #107

Closed MarcAntoineSchmidtQC closed 1 year ago

MarcAntoineSchmidtQC commented 1 year ago

I might misunderstand the docs, but this seems to indicate that when this parameter is True, we want to ignore the cache and always update the table.

ignore_fresh_input – When set to True, the task’s cache function gets ignored when determining the cache validity of a task.

However, the name "ignore fresh input" suggests that we do not want to refresh the tables, even when the inputs have changed.

Could you clarify which one it is?

NMAC427 commented 1 year ago

Setting ignore_fresh_input to True results in the behaviour described by the documentation. However, I agree that it is an unfortunate name. I think I will rename it to ignore_cache_function to make it clearer.