Closed paurkedal closed 6 years ago
Hi
I’m leaving for a trip tomorrow, but hopefully I’ll be able to tackle this on the following week.
Em 8 de dez de 2017, à(s) 22:54, Petter Urkedal notifications@github.com escreveu:
I made the following adjustment to run the non-blocking lwt test repeatedly:
diff --git a/examples/lwt/nonblocking_lwt_example.ml b/examples/lwt/nonblocking_lwt_example.ml index 4602598..d718b96 100644 --- a/examples/lwt/nonblocking_lwt_example.ml +++ b/examples/lwt/nonblocking_lwt_example.ml @@ -96,8 +96,12 @@ let main () = Lwt_stream.iter_s print_row s >>= fun () -> M.Stmt.close stmt >>= or_die "stmt close" >>= fun () -> M.close mariadb >>= fun () ->
- M.library_end (); Lwt.return_unit
+let rec repeat_main n =
- if n = 0 then Lwt.return_unit else
- main () >>= fun () -> repeat_main (n - 1)
let () =
- Lwt_main.run @@ main ()
- Lwt_main.run @@ repeat_main 1000;
- M.library_end () With a simplified OCAML_MARIADB_QUERY='SELECT ?', I get Fatal error: exception Failure("connect: (1300) Invalid utf8mb4 character string: '0h\x88\x83bU'") and similar for the default query. But this only happens after an amount of data is returned, which I guess could correspond to a buffer size.
I am using Ubuntu 16.04, MariaDB 2.3.2, OCaml 4.05.0, and the master branch of ocaml-mariadb.
There may be another issue, but hopefully it is related to the above: I am trying to make test_parallel_lwt.ml from the Caqti testsuite work with MariaDB. This test invokes parallel connect, so I first tried to reproduce the issue with:
let rec repeat_main n = if n = 0 then Lwt.return_unit else main () <&> repeat_main (n - 1) This modification gives me either a segment fault, and abort with additional message
unknown: debugger aborting because missing DBUG_RETURN or DBUG_VOID_RETURN macro in function "vio_read" and a core file which given an uninformative backtrace, or sometimes a more informative:
Error in `./_build/examples/lwt/nonblocking_lwt_example.native': corrupted double-linked list: 0x0000562b449db6b0 ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x7908b)[0x7fc35e00908b] /lib/x86_64-linux-gnu/libc.so.6(+0x810c3)[0x7fc35e0110c3] /lib/x86_64-linux-gnu/libc.so.6(+0x8462f)[0x7fc35e01462f] /lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0x27b)[0x7fc35e0177cb] /usr/lib/x86_64-linux-gnu/libmariadb.so.2(+0x307de)[0x7fc35ecba7de] /usr/lib/x86_64-linux-gnu/libmariadb.so.2(mysql_init+0xd7)[0x7fc35ecb3387] ./_build/examples/lwt/nonblocking_lwt_example.native(+0x163651)[0x562b3e211651] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xbb21d)[0x562b3e16921d] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xc322e)[0x562b3e17122e] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xc6af7)[0x562b3e174af7] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb316c)[0x562b3e16116c] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3785)[0x562b3e161785] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ./_build/examples/lwt/nonblocking_lwt_example.native(+0xb3775)[0x562b3e161775] ======= Memory map: ======== 562b3e0ae000-562b3e28d000 r-xp 00000000 fd:08 32244554 /home/urkedal/proj-ext/ocaml-mariadb/_build/examples/lwt/nonblocking_lwt_example.native 562b3e48d000-562b3e48e000 r--p 001df000 fd:08 32244554 /home/urkedal/proj-ext/ocaml-mariadb/_build/examples/lwt/nonblocking_lwt_example.native 562b3e48e000-562b3e537000 rw-p 001e0000 fd:08 32244554 /home/urkedal/proj-ext/ocaml-mariadb/_build/examples/lwt/nonblocking_lwt_example.native 562b3e537000-562b3e54f000 rw-p 00000000 00:00 0 562b402fe000-562b449e4000 rw-p 00000000 00:00 0 [heap] 7fc358000000-7fc358021000 rw-p 00000000 00:00 0 7fc358021000-7fc35c000000 ---p 00000000 00:00 0 7fc35cc88000-7fc35cc9e000 r-xp 00000000 fd:05 392973 /lib/x86_64-linux-gnu/libgcc_s.so.1 7fc35cc9e000-7fc35ce9d000 ---p 00016000 fd:05 392973 /lib/x86_64-linux-gnu/libgcc_s.so.1 7fc35ce9d000-7fc35ce9e000 r--p 00015000 fd:05 392973 /lib/x86_64-linux-gnu/libgcc_s.so.1 7fc35ce9e000-7fc35ce9f000 rw-p 00016000 fd:05 392973 /lib/x86_64-linux-gnu/libgcc_s.so.1 7fc35ce9f000-7fc35ceaa000 r-xp 00000000 fd:05 396534 /lib/x86_64-linux-gnu/libnss_files-2.24.so 7fc35ceaa000-7fc35d0a9000 ---p 0000b000 fd:05 396534 /lib/x86_64-linux-gnu/libnss_files-2.24.so 7fc35d0a9000-7fc35d0aa000 r--p 0000a000 fd:05 396534 /lib/x86_64-linux-gnu/libnss_files-2.24.so 7fc35d0aa000-7fc35d0ab000 rw-p 0000b000 fd:05 396534 /lib/x86_64-linux-gnu/libnss_files-2.24.so 7fc35d0ab000-7fc35d0b1000 rw-p 00000000 00:00 0 7fc35d103000-7fc35d6c7000 rw-p 00000000 00:00 0 7fc35d6c7000-7fc35d8e0000 r-xp 00000000 fd:05 391463 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 7fc35d8e0000-7fc35dae0000 ---p 00219000 fd:05 391463 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 7fc35dae0000-7fc35dafc000 r--p 00219000 fd:05 391463 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 7fc35dafc000-7fc35db08000 rw-p 00235000 fd:05 391463 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 7fc35db08000-7fc35db0b000 rw-p 00000000 00:00 0 7fc35db0b000-7fc35db69000 r-xp 00000000 fd:05 396126 /lib/x86_64-linux-gnu/libssl.so.1.0.0 7fc35db69000-7fc35dd69000 ---p 0005e000 fd:05 396126 /lib/x86_64-linux-gnu/libssl.so.1.0.0 7fc35dd69000-7fc35dd6d000 r--p 0005e000 fd:05 396126 /lib/x86_64-linux-gnu/libssl.so.1.0.0 7fc35dd6d000-7fc35dd74000 rw-p 00062000 fd:05 396126 /lib/x86_64-linux-gnu/libssl.so.1.0.0 7fc35dd74000-7fc35dd8f000 r-xp 00000000 fd:05 391974 /lib/x86_64-linux-gnu/libz.so.1.2.11 7fc35dd8f000-7fc35df8e000 ---p 0001b000 fd:05 391974 /lib/x86_64-linux-gnu/libz.so.1.2.11 7fc35df8e000-7fc35df8f000 r--p 0001a000 fd:05 391974 /lib/x86_64-linux-gnu/libz.so.1.2.11 7fc35df8f000-7fc35df90000 rw-p 0001b000 fd:05 391974 /lib/x86_64-linux-gnu/libz.so.1.2.11 7fc35df90000-7fc35e14e000 r-xp 00000000 fd:05 396376 /lib/x86_64-linux-gnu/libc-2.24.so 7fc35e14e000-7fc35e34d000 ---p 001be000 fd:05 396376 /lib/x86_64-linux-gnu/libc-2.24.so 7fc35e34d000-7fc35e351000 r--p 001bd000 fd:05 396376 /lib/x86_64-linux-gnu/libc-2.24.so 7fc35e351000-7fc35e353000 rw-p 001c1000 fd:05 396376 /lib/x86_64-linux-gnu/libc-2.24.so 7fc35e353000-7fc35e357000 rw-p 00000000 00:00 0 7fc35e357000-7fc35e35a000 r-xp 00000000 fd:05 396393 /lib/x86_64-linux-gnu/libdl-2.24.so 7fc35e35a000-7fc35e559000 ---p 00003000 fd:05 396393 /lib/x86_64-linux-gnu/libdl-2.24.so 7fc35e559000-7fc35e55a000 r--p 00002000 fd:05 396393 /lib/x86_64-linux-gnu/libdl-2.24.so 7fc35e55a000-7fc35e55b000 rw-p 00003000 fd:05 396393 /lib/x86_64-linux-gnu/libdl-2.24.so 7fc35e55b000-7fc35e663000 r-xp 00000000 fd:05 396411 /lib/x86_64-linux-gnu/libm-2.24.so 7fc35e663000-7fc35e862000 ---p 00108000 fd:05 396411 /lib/x86_64-linux-gnu/libm-2.24.so 7fc35e862000-7fc35e863000 r--p 00107000 fd:05 396411 /lib/x86_64-linux-gnu/libm-2.24.so 7fc35e863000-7fc35e864000 rw-p 00108000 fd:05 396411 /lib/x86_64-linux-gnu/libm-2.24.so 7fc35e864000-7fc35e86b000 r-xp 00000000 fd:05 131522 /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4 7fc35e86b000-7fc35ea6a000 ---p 00007000 fd:05 131522 /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4 7fc35ea6a000-7fc35ea6b000 r--p 00006000 fd:05 131522 /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4 7fc35ea6b000-7fc35ea6c000 rw-p 00007000 fd:05 131522 /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4 7fc35ea6c000-7fc35ea84000 r-xp 00000000 fd:05 396923 /lib/x86_64-linux-gnu/libpthread-2.24.so 7fc35ea84000-7fc35ec84000 ---p 00018000 fd:05 396923 /lib/x86_64-linux-gnu/libpthread-2.24.so 7fc35ec84000-7fc35ec85000 r--p 00018000 fd:05 396923 /lib/x86_64-linux-gnu/libpthread-2.24.so 7fc35ec85000-7fc35ec86000 rw-p 00019000 fd:05 396923 /lib/x86_64-linux-gnu/libpthread-2.24.so 7fc35ec86000-7fc35ec8a000 rw-p 00000000 00:00 0 7fc35ec8a000-7fc35ecdd000 r-xp 00000000 fd:05 141829 /usr/lib/x86_64-linux-gnu/libmariadb.so.2 7fc35ecdd000-7fc35eedd000 ---p 00053000 fd:05 141829 /usr/lib/x86_64-linux-gnu/libmariadb.so.2 7fc35eedd000-7fc35eee3000 r--p 00053000 fd:05 141829 /usr/lib/x86_64-linux-gnu/libmariadb.so.2 7fc35eee3000-7fc35eee5000 rw-p 00059000 fd:05 141829 /usr/lib/x86_64-linux-gnu/libmariadb.so.2 7fc35eee5000-7fc35eeec000 rw-p 00000000 00:00 0 7fc35eeec000-7fc35ef12000 r-xp 00000000 fd:05 392200 /lib/x86_64-linux-gnu/ld-2.24.so 7fc35ef33000-7fc35f0bc000 rw-p 00000000 00:00 0 7fc35f0cc000-7fc35f111000 rw-p 00000000 00:00 0 7fc35f111000-7fc35f112000 r--p 00025000 fd:05 392200 /lib/x86_64-linux-gnu/ld-2.24.so 7fc35f112000-7fc35f113000 rw-p 00026000 fd:05 392200 /lib/x86_64-linux-gnu/ld-2.24.so 7fc35f113000-7fc35f114000 rw-p 00000000 00:00 0 7ffe94221000-7ffe94243000 rw-p 00000000 00:00 0 [stack] 7ffe9428d000-7ffe9428f000 r--p 00000000 00:00 0 [vvar] 7ffe9428f000-7ffe94291000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Thanks. (I still have a few other loose ends left in Caqti before a public release.)
Hi
Are you using libmariadb-dev (Connector/C) or libmariadbclient-dev?
I'm trying to reproduce, but in my case it's the user name getting garbled (i.e. Access denied for user '\224\184\232\001'@'localhost'
or similar). At least the error is also on connect
. Trying to dig deeper.
It's probably one of those errors where the OCaml string is GC'd, leaving the binding with garbage.
Can't reproduce with OCAMLRUNPARAM='s=500M,h=500M,v=0x1ff'
, so definitely a GC problem... still trying to track it.
I'm using libmariadb-dev-2.3.2. With the latest version of nonblocking_{lwt,async}_stress_test
I get malloc(): memory corruption: 0x0000557cba987f70
. I agree it looks like a GC issue. I don't have experience with ctypes, but I recall using caml_copy_string
a lot for strings passed to C code. Since both tests use arrays, I think that might be a candidate too.
I brewed down the nonblocking_stress_test.ml
to the following, which still fails on my side:
open Printf
let env var def = try Sys.getenv var with Not_found -> def
let host = env "OCAML_MARIADB_HOST" "localhost"
let user = env "OCAML_MARIADB_USER" "root"
let pass = env "OCAML_MARIADB_PASS" ""
let db = env "OCAML_MARIADB_DB" "mysql"
module Make (W : Mariadb.Nonblocking.Wait) = struct
module M = Mariadb.Nonblocking.Make (W)
open W.IO
let or_die where = function
| Ok r -> return r
| Error (i, e) -> eprintf "%s: (%d) %s\n%!" where i e; exit 2
let connect () = M.connect ~host ~user ~pass ~db ()
let rec repeat n f =
if n = 0 then return () else f () >>= fun () -> repeat (n - 1) f
let test () =
connect () >>= or_die "connect" >>= fun dbh ->
M.close dbh
let main () = repeat 50000 test
end
In particular, the loop only includes connect
and close
, and the direct arguments to connect
should be protected from GC.
Yeah, you can even replace M.close
with W.IO.return ()
and the problem still happens (you may need to increase max_connections
in MariaDB). So the problem must be on mysql_real_connect_{start,cont}
.
I don't think the issue is with string handling, because the library copies ocaml strings into c buffers (char_ptr_buffer_of_string
), but I also tried replacing ptr char
arguments with string
ones, in which case the ctypes library will do the copy. Also, looking at the libmariadb code, there's an strdup
for the user parameter, so I don't think GC is affecting the string parameters themselves.
It could be some GC issue with the db handle itself, or maybe data somehow being overwritten, but I can't see how.
Really frustrating, no progress at all today.
I've had some success using the lower-level functions directly from the test code. It's really late now, but I hope to be able to come up with an ergonomic version tomorrow.
So I managed to run the simplified stress test with a patch that simply inlines the nonblocking
function in nonblocking.ml
. The core of the change is this:
+ let rec connect_cont mariadb status =
+ match B.mysql_real_connect_cont mariadb status with
+ | 0, _ -> return (Ok mariadb)
+ | s, _ -> W.wait mariadb s >>= fun s -> connect_cont mariadb s
+
+ let handle_connect mariadb = function
+ | 0, Some _ -> return (Ok mariadb)
+ | 0, None -> return (Error (Common.error mariadb))
+ | s, _ -> W.wait mariadb s >>= fun s -> connect_cont mariadb s
+
let connect ?host ?user ?pass ?db ?(port=0) ?socket ?(flags=[]) () =
match init () with
| Some m ->
- nonblocking m (connect m ?host ?user ?pass ?db ~port ?socket ~flags ())
+ let flags = Common.int_of_flags flags in
+ handle_connect m
+ (B.mysql_real_connect_start m host user pass db port socket flags)
Honestly, I don't understand why there's any behavior difference between the two versions. It could be that an underlying bug is now being hidden or something.
With this change, the Lwt stress test from your PR is now failing with prepare: (2006) MySQL server has gone away
. It could be that similar fixes for other nonblocking functions will fix it. I'll try that now.
Actually scratch that... the error is simply being ignored in connect_cont
.
Decided to try the Ctypes list, as I haven't made any real progress today either.
Given the "MySQL server has gone away", it could be this is needed:
diff --git a/examples/nonblocking/nonblocking_stress_test.ml b/examples/nonblocking/nonblocking_stress_test.ml
index 1a4d59a..e6b3583 100644
--- a/examples/nonblocking/nonblocking_stress_test.ml
+++ b/examples/nonblocking/nonblocking_stress_test.ml
@@ -135,6 +135,11 @@ module Make (W : Mariadb.Nonblocking.Wait) = struct
M.Stmt.reset stmt >>= or_die "reset" >|= fun () ->
Hashtbl.replace stmt_cache param_types stmt
end >>= fun () ->
+ Hashtbl.fold
+ (fun _ stmt prologue ->
+ prologue >>= fun () ->
+ M.Stmt.close stmt >>= or_die "close")
+ stmt_cache (return ()) >>= fun () ->
M.close dbh
let main () = repeat 500 test
Though it doesn't apply to the pure connect/close test.
Edit: And "gone away" was probably due to connect_cont
having failed anyway, though I sent a PR to fix this, including in the blocking test.
Just to be sure, I tried to run a nonblocking re-connect root in C, at least no issue there.
With Jeremy Yallop's suggestion I managed to run the stress tests without errors.
It's the same fix as a previous bug that (I think) you reported, a GC issue, but in this case the fact that libmariadb does an strdup
on the user parameter threw me off; I thought keeping references wouldn't be really necessary. I didn't check the other parameters though, so maybe the issue is not the user string specifically.
I added a join
function to the Wait module locally to run the tests in parallel. I don't get a segfault with the current master branch, but a parameter count mismatch error. With some printfs added, I'm getting output like this:
>>> SELECT CAST(? AS datetime), CAST(? AS char), CAST(? AS char), CAST(? AS char), CAST(? AS char), CAST(? AS char), CAST(? AS char)
>>> SELECT CAST(? AS char), CAST(? AS char), CAST(? AS double), CAST(? AS char), CAST(? AS datetime), CAST(? AS char), CAST(? AS char), CAST(? AS integer), CAST(? AS integer), CAST(? AS char)
>>> SELECT CAST(? AS datetime)
>>> SELECT CAST(? AS char)
>>> SELECT CAST(? AS char)
Stmt.execute: (0) parameter count mismatch: 1 (expected 7)
So somehow mysql_stmt_param_count
is returning the count of a different query. Looking into it.
I've just noticed that the stress test uses the same db handle in all repeat
calls. When I changed the code to use <&>
, I believe this breaks the assumptions for threaded MySQL clients, i.e. a lock would be needed to avoid simultaneous queries.
Once I moved the connect
and close
into the body of the function taken by repeat
, the stress test worked, although probably it wasn't such a smart idea to spawn that many threads... my load average reached almost 600 and after a while the test died with a Unix.Unix_error(Unix.EINVAL, "select", "")
, which I guess comes from the insides of Lwt, probably due to lack of resources.
I also noticed your use of Stmt.reset
. I was reading the mysql_stmt_reset
docs, which states that the purpose of this function is to be able to reuse it, passing the statement to mysql_stmt_prepare
. This, however, is not possible in OCaml-MariaDB, because the prepare
function always allocates a new statement:
let prepare mariadb query =
match Common.stmt_init mariadb with
| Some raw -> `Ok (prepare_start mariadb raw query, prepare_cont mariadb raw)
| None -> `Error (Common.error mariadb)
I see two ways to fix this, either by allowing an optional stmt
parameter on prepare
, or by making the stmt_init
function public, and always require the stmt
to be passed to prepare
.
What do you think would be the best way for use in Caqti? Do you know how other database APIs handle this?
Good job finding the issue! I had a look myself earlier this week, but I ended up learning more about ctypes than the problem itself.
I've just noticed that the stress test uses the same db handle in all repeat calls. When I changed the code to use <&>, I believe this breaks the assumptions for threaded MySQL clients, i.e. a lock would be needed to avoid simultaneous queries.
Yes, my idea was to parallelise only for the outer loop. Might be a good idea still to reduce it from 500 to something lower.
My understanding of the description of mysql_stmt_reset
is that it clears any state that was attached by execution of the statement, so that the statement can be re-used with different parameters without calling prepare again. So, I think it's okay that we cannot pass an existing statement to prepare. What we cannot do with the current implementation then, is to do prepare an old statement with a new query, but I don't think that's useful in a language with automatic memory management, anyway.
The way Caqti uses reset is similar to the nonblocking example: It cached statements per handle and per query, so perpare will only be called the first time a given query string is used for a given connection. This is of course not so useful if the query string is generated, so there is a ~oneshot
option to disable caching as well.
Ok, I think we're ready for another release then :)
Yes I think so. I tried using a --dev-repo pinned version against the Caqti test suite, and the memory-related failure is gone. It still fails, ~but I think that is because I'm using a temporary table in parallel from different connections, so probably connections don't have a private namespace for temporary tables in MariaDB~. I'll fix that later today.
Mmm, I just triggered *** Error in `./test_parallel_lwt.exe': corrupted double-linked list: 0x000055e5752c0490 ***
again with mariadb pinned to the current master.
Is jbuilder runtest
on the caqti repo enough to trigger it?
Yes, but you have to add tests/uris.conf
, something like:
mariadb://<user>:<password>@localhost/<database>
The file is just a list of URIs and a file with just sqlite3::memory:
will be created if the file does not exist.
I was looking though binding_wrappers.ml
, and I think the following functions needs a similar treatment: mysql_real_query_start
, mysql_set_character_set_start
, mysql_select_db_start
, mysql_change_user_start
, mysql_stmt_prepare_start
. The latter could explain the failure in test_parallel_lwt.ml
.
It ran fine here:
bikereg alias tests/runtest
BIKE-0003 is owned by BIKE-0003.
BIKE-0042 is not registered.
Stolen: BIKE-0000 2017-12-20T19:57:42-00:00 Arthur Dent
BIKE-0004 2017-12-20T19:57:42-00:00 Marvin
ocamlopt tests/test_sql_lwt_v1.exe
test_param alias tests/runtest
test_parallel_lwt alias tests/runtest
test_pool_lwt alias tests/runtest
bikereg_v1 alias tests/runtest
BIKE-0003 is owned by Trillian.
BIKE-0042 is not registered.
Stolen: BIKE-0000 2017-12-20 19:57:42 Arthur Dent
BIKE-0001 2017-12-20 19:57:42 Ford Perfect
BIKE-0002 2017-12-20 19:57:42 Zaphod Beeblebrox
BIKE-0003 2017-12-20 19:57:42 Trillian
BIKE-0004 2017-12-20 19:57:42 Marvin
test_sql_lwt alias tests/runtest
test_sql_lwt_v1 alias tests/runtest
ocamlopt tests/test_sql_async.exe
test_sql_async alias tests/runtest
ocamlopt tests/test_sql_async_v1.exe
test_sql_async_v1 alias tests/runtest
test_parallel_lwt_v1 alias tests/runtest
However you're probably right: The *_start
functions taking strings must hold a reference to them so that the *_cont
ones can work reliably.
I'm still getting an error. Did you add mariadb to uris.conf
? Can you check find . -name uris.conf
that you only have one in the source and one copy in the build area? My build setup does not require it, so if the file name was misspelled it will just be created with sqlite3, and there is no hint in the output either.
$ find . -name uris.conf
./tests/uris.conf
./_build/default/tests/uris.conf
$ diff ./_build/default/tests/uris.conf ./tests/uris.conf
$
$ cat ./tests/uris.conf
mariadb://root:pass@127.0.0.1/caqti
Okay, thanks, I just wanted to be sure.
I've just pushed a commit with a similar fix for the prepare
functions. Can you see if it fixes the error you're seeing?
I still get the same issue.
But I got a backtrace which is more useful than the above:
/lib/x86_64-linux-gnu/libc.so.6(+0x7908b)[0x7fab58ca408b]
/lib/x86_64-linux-gnu/libc.so.6(+0x810c3)[0x7fab58cac0c3]
/lib/x86_64-linux-gnu/libc.so.6(+0x8462f)[0x7fab58caf62f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0x27b)[0x7fab58cb27cb]
/usr/lib/x86_64-linux-gnu/libmariadb.so.2(+0x307de)[0x7fab56e677de]
/usr/lib/x86_64-linux-gnu/libmariadb.so.2(mysql_init+0xd7)[0x7fab56e60387]
/home/urkedal/.opam/4.06.0/lib/mariadb/mariadb.cmxs(mariadb_stub_3_mysql_init+0x11)[0x7fab570ce1e1]
/home/urkedal/.opam/4.06.0/lib/mariadb/mariadb.cmxs(camlFfi_generated__fun_3323+0x2d)[0x7fab570c9f6d]
/home/urkedal/.opam/4.06.0/lib/mariadb/mariadb.cmxs(camlNonblocking__init_2799+0x1e)[0x7fab570bac7e]
/home/urkedal/.opam/4.06.0/lib/mariadb/mariadb.cmxs(camlNonblocking__connect_inner_6503+0x37)[0x7fab570bdab7]
./../lib-driver/caqti_driver_mariadb.cmxs(camlCaqti_driver_mariadb__connect_prim_9155+0xdb)[0x7fab560d504b]
./test_parallel_lwt.exe(camlCaqti_connect__connect_2021+0x63)[0x55ebc076fab3]
./test_parallel_lwt.exe(camlCaqti_pool__acquire_1393+0x92)[0x55ebc076e142]
./test_parallel_lwt.exe(camlCaqti_pool__use_inner_1679+0x68)[0x55ebc076e598]
...
Interestingly in mysql_init
.
I downgraded to libmariadb 2.3.2 and also tried libmariadbclient, but I can't reproduce it. Are you on Ubuntu 16.04?
I tried it on a different computer. There I can't reproduce the memory corruption. And I'm seeing another issue with too many connections being opened, which must be in caqti, so I'll debug that first.
I have been testing on Ubuntu 17.04. The other one where I couldn't reproduce the main issue is 16.04. So, that may look similar to your case, since the tests sometimes succeed in which case jbuilder will cache the result. The failure on 16.04 is probably due to caqti, anyway.
I can try a 17.04 vm tomorrow and see if I can reproduce it.
I managed to get the error on 17.04:
test_parallel_lwt alias tests/runtest (got signal ABRT)
(cd _build/default/tests && ./test_parallel_lwt.exe)
unknown: debugger aborting because missing DBUG_RETURN or DBUG_VOID_RETURN macro in function "vio_read"
However, the error goes away if I upgrade libmariadb-dev from MariaDB's repository. The sources.list line is
deb http://ftp.osuosl.org/pub/mariadb/repo/10.2/ubuntu zesty main
I also don't get an error when installing the latest Connector/C manually (version 3.0.2).
Can you try to upgrade the connector?
Yes, upgrading to libmariadb3
solved the problem. So, presumably there was a bug in the C library.
Thanks for taking the time to test on 17.04. All my tests pass now after also fixing two issues in Caqti.
I’ve also noticed that on 17.04 libmariadbclient is not installed as libmysqlclient.so anymore... I’ll try to update the library detection to handle all three cases.
Version 1.0.0 was submitted to opam!
I made the following adjustment to run the non-blocking lwt test repeatedly:
With a simplified
OCAML_MARIADB_QUERY='SELECT ?'
, I getFatal error: exception Failure("connect: (1300) Invalid utf8mb4 character string: '0h\\x88\\x83bU'")
and similar for the default query. But this only happens after an amount of data is returned, which I guess could correspond to a buffer size.I am using Ubuntu 16.04, MariaDB 2.3.2, OCaml 4.05.0, and the master branch of ocaml-mariadb.
There may be another issue, but hopefully it is related to the above: I am trying to make
test_parallel_lwt.ml
from the Caqti testsuite work with MariaDB. This test invokes parallel connect, so I first tried to reproduce the issue with:This modification gives me either a segment fault, and abort with additional message
and a core file which given an uninformative backtrace, or sometimes a more informative: