Forward dataframes between tasks in parallel to SQL writeback

In pipedag all tasks can communicate via the database. But due to slow JDBC/ODBC drivers or communication overhead, this may be slow. Thus two dataframe based tasks can also communicate by handing over output dataframes as input to following tasks directly. This allows starting the next task in parallel to writing the outputs to the database. The following stage commit, however, should wait until all input frames were persisted. For pandas 2.0 and polars it may be even possible to hand over the backing arrow dataframe without any copy.

Questions:

[ ] how can conflicts be avoided if two tasks read the same dataframe and may modify their input (in case a useful operating mode requires cooperation of the user, it should be disabled by default)

Features:

[ ] hand over dataframe and start next task in parallel to transfer to database
[ ] ensure zero copy handover of output to next task input for arrow backed pandas 2.0 / polars
[ ] make persistence to database of intermediate tables optional

pydiverse / pydiverse.pipedag

Forward dataframes between tasks in parallel to SQL writeback #76