Closed michaelsbradleyjr closed 1 year ago
Oooh, this is a bug of the kind that's been with us for quite a while - I'm not sure were never able to track it down in a satisfactory way - do you have a stable repro?
do you have a stable repro?
Yes. I'll give the steps to repro, and if it seems too hacky then I'll try to make a branch where the repro is more self-contained, i.e. less need to do prep work.
Clone my fork of nim-dagger and switch to the localstore
branch: https://github.com/michaelsbradleyjr/nim-dagger/tree/localstore.
Make or find some several+ MB file that you'll be uploading to the experimental web server + task_runner. And you'll want a script like this:
upload.sh
#!/usr/bin/env bash
filename="$(basename -- "$1")"
extension="${filename##*.}"
filename="${filename%.*}"
for x in {01..24}; do
curl -i -d @$1 -H "Content-Type: application/json" -X POST \
localhost:30080/upload/$filename$x.$extension &
done
In experiments/localstore.nim
, enable L77 and disable L78; and disable L112.
Run mkdir -p experiments/files && nimble localstore
.
After (4) has invoked the built experiments/localstore
binary, in another terminal run upload.sh [your file]
Shortly after invoking upload.sh
, in the terminal running nimble localstore
you should see localstore
crash, and the output will include:
Error: unhandled exception: index 4096 not in 0 .. 4095 [IndexDefect]
@michaelsbradleyjr I think we finally found the root cause: https://github.com/status-im/nim-chronos/pull/267 - can you run your test again?
I'm experiencing runtime crashes from unhandled exceptions re: nim-chronos, with L1436 in
transports/stream.nim
reported as the culprit:I've tried with Nim v1.2.14, v1.4.8, v1.6.0 — same thing regardless of version.
When I (naively) change L1435 from:
to:
I no longer experience that runtime crash.
Context
I'm doing some early stage experiments combining use of
task_runner
impl_beta_2 branch's threadpool with chronos' http server and async pipes:https://github.com/michaelsbradleyjr/nim-dagger/blob/localstore/experiments/localstore.nim#L77-L78
Re: those two lines in particular, I only experience the runtime crash (i.e. without change mentioned above) if L77 is enabled and L78 is disabled, i.e. only when the Future returned by
writer.write
is not discarded.I can create separate issues re: the following, but this issue feels like a decent place to start the discussion/s...
(1) When composing async streams, is the way I did it in
experiments/localstore.nim
, viareadMessage
and a predicate, the way to go about it? Or is there another API for that purpose? I spent awhile looking through the tests and implementations of nim-chronos, but I may have missed something or a lot of things.(2) Is there a reason why
ReadMessagePredicate
oftransports/stream.nim
does not return a Future? If it did, then my L77-L78 linked above could be replaced with:And the index error would get re-raised as a Defect per
asyncSpawn
.(3) When the crash reported by this issue is "fixed", or ignored by discarding the Future returned by
writer.write
, I'm experiencing a similar looking error when I try to increase the number of concurrent large-ish uploads to the running server (~7 MB per POST request) from a couple of dozen to several dozen:I've tried with Nim v1.2.14, v1.4.8, v1.6.0 — same thing regardless of version. With v1.6.0, if I add
--gc:orc
I get the same crash but the stacktrace is missing, I only get the last line aboutindex -1 ...
and it's sometimes garbled with other characters.This one surely deserves its own bug report, but the reason I mention it here is that it has a (maybe superficial) resemblance to the problem reported at the top of this issue, i.e. involves another loop processor and index error.