Closed paurkedal closed 4 years ago
A (maybe unrelated) problem may be that notice_ml
is called from is_busy
, so I presume
diff --git a/src/postgresql_stubs.c b/src/postgresql_stubs.c
index 0a3fadf..fb95740 100644
--- a/src/postgresql_stubs.c
+++ b/src/postgresql_stubs.c
@@ -1104,7 +1104,7 @@ CAMLprim value PQgetResult_stub(value v_conn)
}
noalloc_conn_info_intnat(PQconsumeInput)
-noalloc_conn_info(PQisBusy, Val_bool)
+conn_info(PQisBusy, Val_bool)
noalloc_conn_info_intnat(PQflush)
noalloc_conn_info_intnat(PQsocket)
will be needed? This could be the case for some other functions, as well. This did not fix the current issue though.
Thanks for the bug report! Yes, it seems that the stub for PQisBusy
needed fixing. It's not just about registering the runtime value, but the call to PQisBusy also needs to happen outside of the runtime lock for correctness.
If you see other functions that could potentially trigger a call to notice_ml
(PQisBusy
documents that), then please submit this here or as a PR.
Could you please test the latest development branch? If this fix works for you, I'll make a new release.
That fixes the version above, but an async version of it now segfaults instead of deadlocking:
let statements = [
"CREATE TEMPORARY TABLE notice_processor_and_async_1 \
(id serial PRIMARY KEY, name text NOT NULL)";
"CREATE TEMPORARY TABLE notice_processor_and_async_2 \
(id integer PRIMARY KEY REFERENCES notice_processor_and_async_1)";
"DROP TABLE notice_processor_and_async_1 CASCADE";
]
let wait_for_result c =
c#consume_input;
while c#is_busy do
ignore (Unix.select [Obj.magic c#socket] [] [] (-.1.0));
c#consume_input
done
let fetch_result c = wait_for_result c; c#get_result
let fetch_single_result c =
match fetch_result c with
| None -> assert false
| Some r -> assert (fetch_result c = None); r
let () = Printexc.register_printer @@ function
| Postgresql.Error err -> Some (Postgresql.string_of_error err)
| _ -> None
let () =
let conn = new Postgresql.connection ~conninfo:Sys.argv.(1) () in
conn#set_notice_processor print_endline;
statements |> List.iter begin fun stmt ->
conn#send_query stmt;
let r = fetch_single_result conn in
assert (r#status = Postgresql.Command_ok)
end
I'll have a look though the stubs tomorrow to see if I find similar issues.
Oh, that was already the version I submitted. Of course, otherwise there would have been no need for PQisBusy
. It's getting a bit late here, will have a closer look tomorrow. Your fix was a good hint.
I had a look at the ways libpq may call the notice processor, and it seems it is quite pervasive, esp. due to pqInternalNotice
. But I'm wondering, what do you mean by the comment that "a runtime feature is needed to fully support this" in notice_ml
? Would that be a shortcut to deal with it, or would it be something needed even if we spot and release the runtime system wherever notice_ml
might be called?
If I remember correctly, there was and probably still is no feature that would allow me to detect whether the currently executing thread is already holding the runtime lock or to only acquire it if it doesn't. If that feature existed, only notice_ml
would need to deal with the runtime lock. But we'd still need to make sure that all functions that could potentially call notice_ml
at least protect needed values from being reclaimed.
Right, so we need to make sure to use CAMLparam1
etc. wherever the runtime might be released.
This will make a lot of the today trivial functions, more heavy weight, but the only options I can see would be to buffer the messages (up to a limit) and pop them off at strategic points, which has its own disadvantages.
CAMLparam1
is pretty cheap, but releasing and acquiring the runtime lock is very expensive. We should certainly not be doing that on each simple call.
Another solution would be to submit a PR to the OCaml team that would allow a thread to determine whether it is holding the runtime lock. This could probably be easily achieved by replacing the thread id counter in the runtime with the result of a call to pthread_self
and providing a function to access the id of the currently executing thread. Sadly, getting PRs into production would probably take a long time. We might as well wait for multicore support in OCaml.
I agree we shouldn't do excessive locking, and I would worry about overlooking a notice invocation or that a future libpq introduced new notices. Could we instead add something like
method set_notice_processing : [< `Stderr | `Quiet] -> unit
for now? Using a variant rather than a boolean allows more options if we should need before a on-demand locking is on the table, like copying messages to a ring-buffer.
At least this should suffice for paurkedal/ocaml-caqti#33 as the main motivation is to avoid messages on stderr. And I agree, while the notices and warnings from PostgreSQL are educational when using the command line, they seem less relevant once the SQL statements are embedded in software.
I'm the author of paurkedal/ocaml-caqti#33 and would like to confirm that for my use case, the proposed solution would be a huge step in the right direction.
I've merged #38 to address this issue and will make a new release now. Sorry for the delay, a global pandemic got in my way...
I found an issue when registering a notice processor when using asynchronous request which triggers a notice. The issue can be reproduced by the following code
The following backtrace shows that it gets stuck in
caml_leave_blocking_section
in thenotice_ml
wrapper trying to acquire acaml_master_lock
.I'm not sure it it's the async code or the notice processor mechanism which is the issue, but I noticed the comment in
notice_ml
: It mentions segfault as potential issue, but if the lock is held, I guess a deadlock could also be a plausible symptom? And could the later contribution of async binding have affected the assumptions?