trealla-prolog / trealla

A compact, efficient Prolog interpreter written in plain-old C.
MIT License
268 stars 13 forks source link

pl_yield_at can cause some undesirable side effects #569

Closed guregu closed 2 months ago

guregu commented 2 months ago

This one is my fault (the yield API stuff I added in #136), but I was wondering if there's a good way to fix it.

Problem: if the yield from yield_at happens during a call to, say, write/1, the side effects will happen twice (i.e. two writes will take place). I was seeing some odd JSON corruption doubling like {"foo"::123} (double colon) or {"bar":99} for a value that should be 9, but it was super rare/sporadic. After much stress testing I narrowed it down to the yield stuff I added for trealla-js:

( ... snip ... )
[0:wasm:133667:f1909:fp1910:cp7:sp16710:hp763:tp3536] CALL write(stdout,:)
YIELDING!! <--- debug thingy I shoved in to check
[0:wasm:133668:f1909:fp1910:cp7:sp16710:hp763:tp3536] REDO write(stdout,:)
[0:wasm:133669:f1909:fp1910:cp7:sp16710:hp763:tp3536] EXIT write(stdout,:)

Is there a good place to move the yield_at check to so that it runs before the actual predicate call? Or maybe some way to inject a harmless choice point for it to continue execution from so it doesn't repeat itself?

infradig commented 2 months ago

A write should keep a track of progress before yielding and resume from there.

On Thu, Jul 18, 2024 at 4:26 PM guregu @.***> wrote:

This one is my fault (the yield API stuff I added in #136 https://github.com/trealla-prolog/trealla/pull/136), but I was wondering if there's a good way to fix it.

Problem: if the yield from yield_at happens during a call to, say, write/1, the side effects will happen twice (i.e. two writes will take place). I was seeing some odd JSON corruption doubling like {"foo"::123} (double colon) or {"bar":99} for a value that should be 9, but it was super rare/sporadic. After much stress testing I narrowed it down to the yield stuff I added for trealla-js:

( ... snip ... ) [0:wasm:133667:f1909:fp1910:cp7:sp16710:hp763:tp3536] CALL write(stdout,:) YIELDING!! <--- debug thingy I shoved in to check [0:wasm:133668:f1909:fp1910:cp7:sp16710:hp763:tp3536] REDO write(stdout,:) [0:wasm:133669:f1909:fp1910:cp7:sp16710:hp763:tp3536] EXIT write(stdout,:)

Is there a good place to move the yield_at check to so that it runs before the actual predicate call? Or maybe some way to inject a harmless choice point for it to continue execution from so it doesn't repeat itself?

— Reply to this email directly, view it on GitHub https://github.com/trealla-prolog/trealla/issues/569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNKSEXPELIAYEJYMYAVHGLZM5N2FAVCNFSM6AAAAABLB7OIH6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYTKNBYGQ3DMMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

guregu commented 2 months ago

Right, currently yield/0 has a branch that checks if it's retrying, and if so it skips yielding again, all good. The yield_at thing I added is missing that kind of logic, and just kind of naively hijacks the run loop and creates a choice point for whatever predicate happens to be the current one (even if it has side effects like write/1). Need to figure out how to skip it on resume. Maybe I can shove a flag somewhere

guregu commented 2 months ago

Perhaps succeed_on_retry is what I'm looking for. I'll try some stuff

guregu commented 2 months ago

Wow, the solution in 44bab305ffdc2026e4757077764b071ca14b6697 is way simpler than what I was trying to do. Thanks, haha. I'll test it out!

guregu commented 2 months ago

Your changes definitely fix my particular issue, ran a stress test for like an hour and it looks good. Thanks 😎 My only worry is what if it yields on some other non-write predicate that has sensitive side effects? For example open or close.

I tried adding a tweaked do_yield for the yield_at situation like so:

bool do_yield_then(query *q, bool status)
{
#ifdef __wasi__
    if (!q->is_task && !q->pl->is_query)
#else
    if (!q->is_task)
#endif
        return true;

    q->yield_at = 0;
    q->yielded = true;
    q->tmo_msecs = get_time_in_usec() / 1000;
    q->tmo_msecs += 1;
    check_heap_error(push_choice(q));
    choice *ch = GET_CURR_CHOICE();
    if (status)
        ch->succeed_on_retry = true;
    else
        ch->fail_on_retry = true;
    return false;
}
diff --git a/src/query.c b/src/query.c
index 8b53ddca..87a81a3e 100644
--- a/src/query.c
+++ b/src/query.c
@@ -1646,7 +1646,7 @@ bool start(query *q)
                                        uint64_t now = get_time_in_usec() / 1000;

                                        if (now > q->yield_at)  {
-                                               do_yield(q, 0);
+                                               do_yield_then(q, status);
                                                break;
                                        }
                                }

And it seems like it does the trick... so far.

infradig commented 2 months ago

Does that mean the checks I added aren't needed anymore?

On Thu, 18 July 2024, 21:57 guregu, @.***> wrote:

Your changes definitely fix my particular issue, ran a stress test for like an hour and it looks good. Thanks 😎 My only worry is what if it yields on some other non-write predicate that has sensitive side effects? For example open or close.

I tried adding a tweaked do_yield for the yield_at situation like so:

bool do_yield_then(query *q, bool status) {#ifdef wasi if (!q->is_task && !q->pl->is_query)#else if (!q->is_task)#endif return true;

q->yield_at = 0; q->yielded = true; q->tmo_msecs = get_time_in_usec() / 1000; q->tmo_msecs += 1; check_heap_error(push_choice(q)); choice *ch = GET_CURR_CHOICE(); if (status) ch->succeed_on_retry = true; else ch->fail_on_retry = true; return false; }

diff --git a/src/query.c b/src/query.c index 8b53ddca..87a81a3e 100644--- a/src/query.c+++ b/src/query.c@@ -1646,7 +1646,7 @@ bool start(query *q) uint64_t now = get_time_in_usec() / 1000;

                                    if (now > q->yield_at)  {-                                               do_yield(q, 0);+                                               do_yield_then(q, status);
                                            break;
                                    }
                            }

And it seems like it does the trick... so far.

— Reply to this email directly, view it on GitHub https://github.com/trealla-prolog/trealla/issues/569#issuecomment-2236316592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNKSEWK76X76KKHPDB3F6LZM6UUPAVCNFSM6AAAAABLB7OIH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWGMYTMNJZGI . You are receiving this because you commented.Message ID: @.***>

guregu commented 2 months ago

I think they aren't needed anymore, yeah. Thank you though. I'm currently running a stress test on it and it's been good for a couple hours. I can send a PR later with the changes, although AFAIK trealla-js is the only thing that cares about this API (because the JS runtime is weird, trealla-go for example just runs the interpreter in its own coroutine and everything is easy).

guregu commented 2 months ago

It should be fine to remove them for now of course. I'd like to send a PR that updates various wasm-related things sometime soonish with the goal of not needing a fork anymore, it's approaching stability and I'm doing some heavy duty battle testing at the moment.

infradig commented 2 months ago

Please do.

On Thu, 18 July 2024, 23:41 guregu, @.***> wrote:

It should be fine to remove them for now of course. I'd like to send a PR that updates various wasm-related things sometime soonish with the goal of not needing a fork anymore, it's approaching stability and I'm doing some heavy duty battle testing at the moment.

— Reply to this email directly, view it on GitHub https://github.com/trealla-prolog/trealla/issues/569#issuecomment-2236553962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNKSEVB4Y3F4NWKBGVFLTDZM7AY3AVCNFSM6AAAAABLB7OIH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWGU2TGOJWGI . You are receiving this because you commented.Message ID: @.***>

infradig commented 2 months ago

There's still a problem with partial async writes to sockets that's to be dealt with. Reads use buffered I/O and are fine but writes do not.

This a separate issue to this one though.