Open spk121 opened 3 years ago
But replacing GLib's run-in-thread
with Guile's call-with-new-thread
seems okay so far. Maybe the fix is don't do that.
Having used Guile from GTK without Guile-GI/G-Golf before, I do faintly remember, that you have to scm_init_guile
in that thread. I think in our case, we might want to do that when entering callbacks and closures. Would that solve the issue or are things also buried more deeply in gig_argument
etc?
That's probably enough. It appears that g_task_thread_pool_thread
FFI goes to callback_binding()
before trying to unpack any arguments, and that calling scm_init_guile
in callback_binding()
fixes.
That then reveals a further problem in this test. The creation of a <GTask>
doesn't require a source object, but, the source object of a <GTask>
ends up passed to the callback GTaskThreadFunc
where the source object is not marked nullable. But that's a separate issue.
OK, more info.
OK, at some point I'd converted some of the type errors under gig_argument_c_to_scm
to Guile errors scm_wrong_type
etc.
After adding scm_init_guile
to callback_binding
we're always in a Guile-managed thread, but, if that thread was called by g_task_run_in_thread, it is not within a top-level scm_c_catch
or scm_c_with_exception_handler
as far as I can tell. Guile errors are not caught and cause the program to end.
One brute force solution would be to enclose the guts of callback_binding
in a continuation barrier.
Perhaps we should use scm_with_guile
instead of scm_init_guile
. Firstly, it neatly cleans up the Guile mode, but it also wraps the function in scm_with_continuation_barrier
. FWIW my initial implementation of callbacks also had an exception handler, that did a stack trace.
I like that idea.
I think what I'd really want in callback_binding
and c_callback_binding
is to install a continuation barrier only if we know that one doesn't already exist.
if (scm_thread_is_known_to_guile ())
callback_binding_inner(&args);
else {
if (NULL == scm_with_guile (callback_binding_inner, &args))
abort ();
}
Guile-managed threads can throw and be caught from before the FFI call, but, non-Guile-originated threads will install a local continuation barrier and abort on error. At the moment, I don't know how to implement scm_thread_is_known_to_guile
using Guile API. If you have access to Guile internals, it is just
int
scm_thread_is_known_to_guile()
{
if (SCM_I_CURRENT_THREAD)
return 1;
return 0;
}
Actually, you'd have to check SCM_I_CURRENT_THREAD->guile_mode
.
Perhaps we can instead keep a set of tids, that are currently in Guile mode?
OK. I changed the logic to catch callback errors and immediately quit if we're not in the main thread, or to throw if we're in the main thread, which is a decent compromise. This saves me from trying to figure out if a thread is guileified and if I need to re-throw a caught error
I don't think I particularly like the eval exit there. Would there be a better way to at least print the stack trace first and then call scm_exit
or exit
ourselves as a procedure?
Also I feel as though the callback thread fluid should be #t
once the callback has been spawned, so that inner callbacks can still throw errors up the stack as it were, only the topmost one can't. Pardon me if that's already done and I'm just misreading the source code there.
For the latter point, that's a great idea, and I hadn't thought of that.
For the former, the scm_with_guile
should dump the stack trace for the callback because it installs an exception handler. It won't do the stack trace of the main thread, though. Is that what you're asking?
Also, there is an scm_primitive_exit
, but, the clean exit
is only an API from scheme.
Hmm, as long as the relevant code portion is accurately traced, that's fine. IIUC in the main thread, the error is thrown upwards anyway, so intermediate traces don't need printing. I still feel that we shouldn't have a raw "scm_eval" in there, though, perhaps call by module-ref?
Interestingly, while the current tree apparently checks out fine in both Azure and Travis, the test/task.scm
test in d719e466b0b2e87f3bb03e20ab8a1877c8176a40 always segfaults on a couple of my boxes. The segfault, as near as I can tell, is due to a foreign function being called after atexit
has kicked off the freeing of the FFI-defined functions and callback in gig_function.c
and gig_callback.c
. So really this is more aligned with #60
It runs fine in Travis/Azure, because the test is skipped. Perhaps you want primitive-_exit
instead of hard exit
? Downside is we'd have to unwind the stack on our own.
In writing a test case for the
<GTask>
, I've caused aSIGSEGV
inSCM_STACK_CHECK
when callingclass-slot-ref
on the<GTask>
. The segfault occurs in code being run in the thread spawned byg_run_in_thread
.Root cause TBD, but, I expect that we need a call to
scm_init_guile()
in any GLib-managed thread.A reproducing case follows