Open michaelpj opened 4 weeks ago
Hmm, this is very weird. undefinedData, called at src/Effectful/Internal/Env.hs:413:26
is in deleteEffect
. So that means that you're trying to access an effect that went out of scope. However, there are additional version checks (in getLocation
) that should fail before.
Can you try with https://github.com/haskell-effectful/effectful/tree/hascallstack-runners? This should at least give you a location which deleted effect it tries to use.
Great, that's helpful. I'll try and do it but it might be a bit since it only happens in production (:see_no_evil: ).
In case it helps, here are a few possibly relevant facts:
So conceivably something is going wrong with cloning?
Anyway, I'll try and run with your branch and report back.
since it only happens in production (:see_no_evil: ).
Ugh, my condolences.
Ok, if this is a concurrent code and is hard to reproduce, that explains it. It looks like you have two threads using Env
that hasn't been cloned and thus uses the same underlying Storage
:confused:
import Control.Concurrent
import Control.Concurrent.Async
import Effectful
import Effectful.Reader.Static
import Effectful.Dispatch.Static.Unsafe
works :: IO ()
works = runEff $ do
x <- runReader () $ do
withEffToIO (ConcUnlift Ephemeral $ Limited 1) $ \unlift -> do
x <- liftIO . async $ do
threadDelay 1000000
unlift ask
threadDelay 500000
pure x
liftIO $ wait x
fails :: IO ()
fails = runEff $ do
x <- runReader () $ do
reallyUnsafeUnliftIO $ \unlift -> do
x <- liftIO . async $ do
threadDelay 1000000
unlift ask
threadDelay 500000
pure x
liftIO $ wait x
>>> works
>>> fails
*** Exception: version (2) /= storageVersion (0)
CallStack (from HasCallStack):
error, called at src/Effectful/Internal/Env.hs:360:5 in effectful-core-2.4.0.0-inplace:Effectful.Internal.Env
fails
tries to use the Reader
after it was deleted by the parent thread and it fails, but it still fails on the version check (because it's reset to 0 when an effect is deleted). What I think is happening in your case is there is a race between a thread accessing the effect and the one deleting it:
effectful
does version check, it succeeds.Storage
and it's no longer there since it was deleted by T2.Very unfortunate. It's likely you use a static effect that wraps an IO-based concurrent API and one of the wrapper functions has a bug, i.e. doesn't properly clone Env
when spawning a thread :thinking:
Ok, I merged #244 with a bunch of HasCallStack
constraint for better call stacks.
I also pushed https://github.com/haskell-effectful/effectful/tree/thread-sanitizer with a PoC of a primitive thread sanitizer.
master:
>>> fails
*** Exception: version (2) /= storageVersion (0)
CallStack (from HasCallStack):
error, called at src/Effectful/Internal/Env.hs:360:5 in effectful-core-2.4.0.0-inplace:Effectful.Internal.Env
getLocation, called at src/Effectful/Internal/Env.hs:309:14 in effectful-core-2.4.0.0-inplace:Effectful.Internal.Env
getEnv, called at src/Effectful/Internal/Monad.hs:600:35 in effectful-core-2.4.0.0-inplace:Effectful.Internal.Monad
getStaticRep, called at src/Effectful/Reader/Static.hs:53:15 in effectful-core-2.4.0.0-inplace:Effectful.Reader.Static
ask, called at tests/Main.hs:53:16 in effectful-2.4.0.0-inplace-test:Main
thread-sanitizer:
>>> fails
*** Exception: Storage used from multiple threads: owner (54) /= current (55)
CallStack (from HasCallStack):
error, called at src/Effectful/Internal/Env.hs:456:14 in effectful-core-2.4.0.0-inplace:Effectful.Internal.Env
readStorage, called at src/Effectful/Internal/Env.hs:356:28 in effectful-core-2.4.0.0-inplace:Effectful.Internal.Env
getLocation, called at src/Effectful/Internal/Env.hs:311:14 in effectful-core-2.4.0.0-inplace:Effectful.Internal.Env
getEnv, called at src/Effectful/Internal/Monad.hs:600:35 in effectful-core-2.4.0.0-inplace:Effectful.Internal.Monad
getStaticRep, called at src/Effectful/Reader/Static.hs:53:15 in effectful-core-2.4.0.0-inplace:Effectful.Reader.Static
ask, called at tests/Main.hs:53:16 in effectful-2.4.0.0-inplace-test:Main
Basically, whenever you access a fresh Storage
, it gets bound to the current thread, so if it ever gets accessed from a different thread, you'll immediately know. If my hypothesis from above post is correct, this should let you identify the problem quickly :thinking:
Hmmmmm. I don't think we're doing anything weird here. The two threads run together and the main thread waits for both, and I don't think anything changes the environment. But I think it must be something like that.
That thread sanitizer stuff looks amazing! I will try that out too!
The thread sanitizer did not trigger. Here's a stack trace:
undefined data
CallStack (from HasCallStack):
error, called at src/Effectful/Internal/Env.hs:433:17 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Env
undefinedData, called at src/Effectful/Internal/Env.hs:418:26 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Env
deleteEffect, called at src/Effectful/Internal/Env.hs:224:3 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Env
unconsEnv, called at src/Effectful/Internal/Monad.hs:526:5 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Monad
runHandler, called at src/Effectful/Dispatch/Dynamic.hs:465:20 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Dispatch.Dynamic
reinterpret, called at src/Effectful/NonDet.hs:78:21 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.NonDet
runNonDetRollback, called at src/Effectful/NonDet.hs:60:22 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.NonDet
runNonDet, called at <our code>
So possibly a bug or gotcha in runNonDetRollback
. I think I don't fundamentally need that, so I'm probably going to just try and get rid of it and see if that helps. But maybe somewhere to look? I think it's likely that this is happening when we hit an empty
and trigger a rollback.
The thread sanitizer did not trigger
Well, that's one good news at least.
So possibly a bug or gotcha in runNonDetRollback.
Well, it turns out that stack traces that come from undefinedData
are deceptive. Apparently, due to how HasCallStack
on values works, it points to the place in the code where undefinedData
was inserted into the Storage, not where it was accessed :confused: So looks like your code tries to access the NonDet
effect that is not there, sigh.
However, I previously misread the trace. The code doesn't try to access the effect data, it tries to access its relinker. So looks like this is triggered inside cloneEnv
somewhere that tries to relink a NonDet
effect that's not there anymore. I still don't understand how this can happen however :disappointed:
I'm going to merge #247 that will start producing expected call stacks when undefined relinker is called, hopefully this sheds more light on the situation.
Okay, I'll try running with https://github.com/haskell-effectful/effectful/pull/247 too.
New day, new stack trace!
undefined relinker
CallStack (from HasCallStack):
error, called at src/Effectful/Internal/Env.hs:459:56 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Env
relinker, called at src/Effectful/Internal/Env.hs:179:17 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Env
cloneEnv, called at src/Effectful/Internal/Unlift.hs:148:17 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Unlift
ephemeralConcUnlift, called at src/Effectful/Internal/Unlift.hs:122:3 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Unlift
concUnlift, called at src/Effectful/Internal/Monad.hs:242:39 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Monad
concUnliftIO, called at src/Effectful/Internal/Monad.hs:205:40 in effectful-core-2.4.0.0-4kVXPiFKBfI6XV8bIBZFIP:Effectful.Internal.Monad
withEffToIO, called at <our code>
So we call an unlifting function, but when we copy our environment, some of it is missing?
I am also somewhat baffled to say that switching OnEmptyRollback
to OnEmptyKeep
seems to have made the problem go away. But it might well just be lurking, and we just don't happen to hit it.
So we call an unlifting function, but when we copy our environment, some of it is missing?
Yeah, basically. cloneEnv
goes through all effects in Storage
and relinks them. Apparently in this case there is an effect there that was already deleted, but I don't understand at all how this could've happened since deleteEffect always deletes from the top and decreases the size of Storage.
It would be interesting if this indeed disappeared with switching to OnEmptyKeep
, because it might be some weird interaction caused by restoreEnv (thatOnEmptyRollback
uses) which overwrites the Storage, so in principle the size of the storage and its contents might diverge when this function is used, but of course it has a sanity check that relevant Storage sizes are the same -_- So I genuinely have no idea what's going on.
I fixed a bug in OnEmptyRollback
that wouldn't roll back when the whole action failed in 1da88766ea52d28dbd711155639711b1c979c1ec, but that's unrelated.
I don't think I can do much more without a reproducer and no access to your code :(
Now that you have a location of the NonDet
effect that went out of scope and the location of withEffToIO
that attempts to call its relinker, how do these locations relate to each other? Perhaps with this information you could try to write a reproducer?
FWIW, I also pushed d97c9dbd519b818acb1887fa937cbec77cfe36bc so that when an undefined relinker is called, it will show both call stacks of its creation and its call site (since both are important), thought we already got them.
Hmm okay. I will see if I can make a reproducer.
Another question is: are there any functions exposed from effectful
that don't do the right checks? We do have a bunch of effect definitions and it's possible that one of them does something unsafe.
Apart from stuff in Effectful.Dispatch.Static.Unsafe
, no.
Of course you can always just pass an Env to a different thread such that there is more than one thread operating on the same Env simultaneously, but then presumably you would've got errors from the thread-sanitizer branch.
Btw, did you stop getting this error with OnEmptyKeep
?
Btw, did you stop getting this error with OnEmptyKeep?
Yes I did. Although to be honest, that may just mean that I've peturbed the situation enough that we happen not to trip over the problem, not that it's necessarily related. I'm planning to roll back that change occasionally to keep debugging :sweat_smile:
We've been running with
master
for a bit in order to get the fix for https://github.com/haskell-effectful/effectful/issues/237. I think we may have hit another bug?I don't have a reproducer, but we're getting the following error:
From a very cursory look, I guess this could occur if some parts of the environment are not initialized properly, which could conceivably have been introduced with the recent fix PRs? I can try and debug more, but it's not easy to reproduce. Perhaps a good step would just to be to add some more
HasCallStack
s?