Closed begelundmuller closed 2 weeks ago
I don't have a clear idea about how we might solve this, but here are some thoughts (probably too hacky):
filestore_to_self
executor for DuckDB and ClickHouse for ingesting from the file
/local_file
connectorinvalidate_on_change
is set to true
in the InputProps
, we could compute a hash of the ingested files and store it in the ModelResult
ModelManager.Exists
, we could re-compute the hash and return false
if it changedWe can also consider following alternate solutions if the use case is only for mapping
files kind of situations:
invalidate_always
which always returns true
for ModelManager.Exists
so that the model is always updated on reconcile. Since these mappings are expected to be small it should be okay to always refresh them.1 has a limitation that dependencies will not be refreshed. 2 has a limitation that dependencies will always be refreshed.
If we want to solve this for all cases then I think the solution listed above is the only way. Two considerations though :
ModelManager
so that hash can be recomputed.We can also consider following alternate solutions if the use case is only for
mapping
files kind of situations:
- Unfortunately they're often referenced in security policies and dimension/measure expressions, which get hit a ton (e.g. when listing dashboards, we check the security policy of each dashboard in the project), so I'd be too worried about the number of disk reads and CSV parses here.
Reconcile
at all. Reconcile
is only called on controller restart or when the resource spec changes.If we want to solve this for all cases then I think the solution listed above is the only way. Two considerations though :
About 1. here – for ClickHouse, the file
/local_file
connector should still load the local path on the runtime and upload/write it into a ClickHouse table. Because the goal for the local file connector is using small data files that are checked into Git.
About 1. here – for ClickHouse, the file/local_file connector should still load the local path on the runtime and upload/write it into a ClickHouse table. Because the goal for the local file connector is using small data files that are checked into Git.
Sure I think this can be handled separately. I will raise an issue.
We have gotten several reports about file sources not being updated when the local file is changed. This is especially problematic for
mapping.csv
files used for access management in security policies.We need to be careful only to do this for small local files, to avoid accidentally scanning large/remote files (even for computing a hash). Maybe it should be an optional parameter for the
file:
connector?