techascent / tmducken

tech.ml.dataset integration with duckdb
MIT License
57 stars 5 forks source link

bug close-db twice. #21

Open jin-park-dev opened 8 months ago

jin-park-dev commented 8 months ago

I discovered strange behaviour with 'close-db'. I'm using latest version '0.8.1-13' I'm on WSL2 and VSCode with Calva.

  1. After running (duckdb/close-db), I can still do things with conn. Seems unintuitive but maybe this is intended behaviour?
  2. I run (duckdb/close-db db) again, my nREPL disconnects.

I attached example code below with tutorial from readme and doing close-db twice.

;; Standard tutorial from README

(duckdb/initialize!)

(def stocks
  (-> (ds/->dataset "https://github.com/techascent/tech.ml.dataset/raw/master/test/data/stocks.csv" {:key-fn keyword})
      (vary-meta assoc :name :stocks)))

(def db (duckdb/open-db))
(def conn (duckdb/connect db))
(duckdb/create-table! conn stocks)
(duckdb/insert-dataset! conn stocks)

(ds/head (duckdb/sql->dataset conn "select * from stocks"))

;; buggy part
(duckdb/close-db db)
(ds/head (duckdb/sql->dataset conn "select * from stocks"))  ; after closing-db this still works?

(duckdb/close-db db) ; nREPL Connection was closed. Why my nREPL getting disconnected?

Many thanks for TMD <3

jin-park-dev commented 8 months ago

Small update. Above is on duckdb v0.8.1 with bashscript example (from readme) Tried on duckdb v0.9.2 and same nREPL disconnect.

cnuernber commented 8 months ago

The REPL is disconnected because the process crashes - double free in C is often a crash-level bug leading immediately to segfaults.

The connection still works because it has a shared pointer to the database - you need to close both the database and the connection to terminate the duckdb instance under the covers.

I am not certain this points to any real issue. What solutions do you expect from this?

jin-park-dev commented 8 months ago

Connection part makes sense thanks for explaining.

Running close-db twice and getting disconnect from REPL still seem like unexpected behaviour. If reason is C code crashing causing REPL to disconnect, it seems to me there can be some kind of check not reach that state.

If you think otherwise, feel free to close.

cnuernber commented 8 months ago

I think it isn't fixable at the lowest level as dt-ffi pointers are immutable - but I think perhaps the opendb and open-connection pathways could return java.lang.AutoCloseable objects that could themselves have mutable ptr fields and would - as is common throughout java - quietly ignore or perhaps just warn on double close.

Are you interested doing this work in a PR?

jin-park-dev commented 8 months ago

I'm currently away for next 2 days. When I'm back I'll have a look at it.

cnuernber commented 8 months ago

Sounds great - here are a quick time savers for when you get back. There is a protocol used to get the actual pointer from the object passed in - this allows a generic deftype to behave like a pointer and then you can overload various things such as adding closeable and potentially overriding toString.