A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
We don't use transactional features for normal pipeline tasks since we lock complete stages. Thus we can use SET TRANSACTION LEVEL READ UNCOMMITTED for mssql and SQL_ATTR_TXN_ISOLATION=RU for DB2. However, for processing metadata, we do need transactions to work.
For DB2 we don't need Repeatable Read (RR). Thus we might be able to fix the deadlock CI problems by switching to Read Stability (RS). https://www.ibm.com/docs/en/db2/11.1?topic=issues-isolation-levels
(For resolving aliases, we read IBMSYS table. Concurrently copying cached tables, however, modifies those very IBMSYS tables. RR might be too strict to allow for this concurrent operation even if we never touch the same table with two threads.)
We don't use transactional features for normal pipeline tasks since we lock complete stages. Thus we can use
SET TRANSACTION LEVEL READ UNCOMMITTED
for mssql andSQL_ATTR_TXN_ISOLATION=RU
for DB2. However, for processing metadata, we do need transactions to work.For DB2 we don't need Repeatable Read (RR). Thus we might be able to fix the deadlock CI problems by switching to Read Stability (RS). https://www.ibm.com/docs/en/db2/11.1?topic=issues-isolation-levels (For resolving aliases, we read IBMSYS table. Concurrently copying cached tables, however, modifies those very IBMSYS tables. RR might be too strict to allow for this concurrent operation even if we never touch the same table with two threads.)