risingwavelabs / dbt-risingwave

Apache License 2.0
21 stars 6 forks source link

feat: add cancel method for streaming jobs #49

Closed MattiasMTS closed 1 month ago

MattiasMTS commented 1 month ago

Solves #47. Note sure if unit test is needed -> let me know wdyt. Tested in our cluster.

MattiasMTS commented 1 month ago

Do you think we could get this in @chenzl25? 😋

chenzl25 commented 1 month ago

Could you let me know if you used background ddl in your case? IIUC, once the connection is canceled, the related ddl should be canceled automatically.

MattiasMTS commented 1 month ago

Could you let me know if you used background ddl in your case? IIUC, once the connection is canceled, the related ddl should be canceled automatically.

Not sure tbh. Essentially, if you cancel the job now -> it says "abort" but if you look into SHOW PROCESSLIST the model will still be building. To stop this from being stuck -> you would have to do this by querying the SHOW PROCESSLIST and then KILL <pid> to make the dbt cancel work.

You can look into the dbt-postgres adapter on how they do it: https://github.com/dbt-labs/dbt-postgres/blob/cb4a95392fe3156bf7bdf8e8e71d4fa3fab07675/dbt/adapters/postgres/connections.py#L185-L204

dbt-risingwave will inherit this but it won't actually kill the streaming job / table being built. Perhaps this is due to the nature of how the pid is getting fetches via connection.handle.get_backend_pid and then subsequently killed via pg_terminate.

Compared to how you do it in risingwave https://docs.risingwave.com/docs/current/sql-show-processlist/#terminate-the-process. Noteworthy, you also have https://docs.risingwave.com/docs/current/sql-cancel-jobs/ but our team has always used the PROCESSLIST & KILL combination to terminate this.