Open KamilPiechowiak opened 5 months ago
This code persists input, I am not sure if it should. Notice that persistence_mode is set to UDF_CACHING:
persistence_mode
UDF_CACHING
import pathway as pw class InSchema(pw.Schema): a: int b: int t = pw.io.csv.read("a.csv", persistent_id="abc", schema=InSchema, mode="static") persistence_backend = pw.persistence.Backend.filesystem("./xyz") persistence_config = pw.persistence.Config.simple_config( persistence_backend, persistence_mode=pw.PersistenceMode.UDF_CACHING, ) pw.debug.compute_and_print_update_stream(t, persistence_config=persistence_config)
If you run the code twice, you'll see that the values are read from persistence on the second run.
First run: | a | b | __time__ | __diff__ ^31NXFBM... | 1 | 3 | 1718180081298 | 1 ^TC3B0CF... | 2 | 4 | 1718180081298 | 1 ^VH8R9JC... | 3 | 5 | 1718180081298 | 1 Second run: | a | b | __time__ | __diff__ ^31NXFBM... | 1 | 3 | 0 | 1 ^TC3B0CF... | 2 | 4 | 0 | 1 ^VH8R9JC... | 3 | 5 | 0 | 1
UDF_CACHING mode not persisting the input even if persistent_id is set or error that the persistent_id is set in UDF_CACHING mode.
persistent_id
0.12.0
No response
Linux
None
In general the persistence_mode is not documented enough. I agree that it is confusing that enabling UDF caching enables the rest of the persistence mechanisms.
Steps to reproduce
This code persists input, I am not sure if it should. Notice that
persistence_mode
is set toUDF_CACHING
:If you run the code twice, you'll see that the values are read from persistence on the second run.
Relevant log output
What did you expect to happen?
UDF_CACHING
mode not persisting the input even ifpersistent_id
is set or error that thepersistent_id
is set inUDF_CACHING
mode.Version
0.12.0
Docker Versions (if used)
No response
OS
Linux
On which CPU architecture did you run Pathway?
None