pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Set isolation level for mssql and DB2 #63

Open windiana42 opened 1 year ago

windiana42 commented 1 year ago

We don't use transactional features for normal pipeline tasks since we lock complete stages. Thus we can use SET TRANSACTION LEVEL READ UNCOMMITTED for mssql and SQL_ATTR_TXN_ISOLATION=RU for DB2. However, for processing metadata, we do need transactions to work.

For DB2 we don't need Repeatable Read (RR). Thus we might be able to fix the deadlock CI problems by switching to Read Stability (RS). https://www.ibm.com/docs/en/db2/11.1?topic=issues-isolation-levels (For resolving aliases, we read IBMSYS table. Concurrently copying cached tables, however, modifies those very IBMSYS tables. RR might be too strict to allow for this concurrent operation even if we never touch the same table with two threads.)