Open jmgomez opened 2 months ago
When trying to reproduce I didn't get a stacktrace, only "out of memory".
hmm In what OS are you testing? Do you get that same issue in version-2.0
too?
There is proc raiseOutOfMem() {.noinline.} =
maybe you could add a stacktrace there
hmm In what OS are you testing? Do you get that same issue in version-2.0 too?
Windows; no; with refc/arc it gives the segfault exit code (as opposed to orc). The above stacktrace should only be reachable in refc from my understanding since usrToCell
is not declared otherwise.
I'm out of my depth here, I thought maybe a type graph change or cycle was the cause, I also don't know where a deep copy could be happening
@ringabout @metagn Please bisect it.
Windows; no; with refc/arc it gives the segfault exit code (as opposed to orc). The above stacktrace should only be reachable in refc from my understanding since usrToCell is not declared otherwise.
I got the stacktrace in MacOs. Yes, refc
is used
Currently got as far as:
started with good 185e06c92362083c06c76f87e325889b1c9dc659, bad a93c5d79b95b569711478b81d2ae9d19675fc4d8 (1 commit good after disabling effect errors) good 320311182 after disabling effect errors good c23d6a3cb good f12683873
Bisected to 05df263b84de9008266b3d53e2c28b009890ca61 (commit history was messed up the first time), @yglukhov any idea what the problem could be?
The type that deepcopy
is iterating over during the segfault is tyRef
. #23787 changes the creation of the environment ref type here. This might not have anything to do with it, but I tried to stress the added condition by mixing proc/iterator closures in the same owner context. I noticed this behavior difference:
proc foo() =
let x = 123
iterator bar2(): int {.closure.} =
yield x
proc bar() =
let z = bar2
for y in z(): # just doing bar2() gives param not in env: x
echo y
bar()
foo()
The output of this is now 1
when it used to be 0
. Both are invalid, it should be 123
. This is the only difference I've found that gives a runtime difference versus failing to compile. There is no problem when a closure iterator calls a closure proc, only when a closure proc calls a closure iterator.
Changing foo
to be a closure iterator crashes the compiler with all permutations now, but is different before #23787. This used to give a codegen error, now segfaults when getEnvParam
calls getEnvTypeForOwner
:
iterator foo(): int {.closure.} =
let x = 123
iterator bar2(): int {.closure.} =
yield x
proc bar() =
let z = bar2
for y in z():
echo y
bar()
yield x
for x in foo(): echo x
And this used to work and give the correct output but now also segfaults:
iterator foo(): int {.closure.} =
let x = 123
proc bar() =
echo x
iterator bar2(): int {.closure.} =
bar()
yield x
for y in bar2():
yield y
for x in foo(): echo x
I wouldn't even know where to start to try and minimize the segfault in nimlangserver, but the first closure proc/iterator mixing example seems realistic to encounter in async code. It was broken before too but maybe it zeroing made it more managable.
The issue is in copy codegen that violates memory, which was present before #23787, but was revealed likely because the memory layout changed. If i take nim version 051a53627 (just before my changes), and compile the following
proc foo() =
let x = 123
iterator bar2(): int {.closure.} =
yield x
proc bar() =
let z = bar2
for y in z(): # just doing bar2() gives param not in env: x
echo y
bar()
foo()
with gcc, I'm getting codegen error:
~/.cache/nim/test2_d/@mtest2.nim.c:330:70: error: passing argument 2 of ‘eqcopy___test50_u113’ from incompatible pointer type [-Wincompatible-pointer-types]
330 | nimln_(79); eqcopy___test50_u113(&(*colonenv_).colonup_, colonenv_);
Take a careful look at this line:
eqcopy___test50_u113(&(*colonenv_).colonup_, colonenv_);
It tries to copy colonenv_ into a field of itself, which is obviously wrong.
Unfortunately I'm not quite familiar with this logic, can anyone provide any input?
It's not "obviously wrong", closure environments can be self-referencing which is why you need a cycle collector for them. This is even more true for the so called "up" pointers that are so complex that I keep forgetting how they work.
Not pretending to understand, but my thinking was that "ups" always point to the parent closure env and "ups" can never be cyclic. Sure, cycles can form, just not through "ups". Am I wrong here? Also the pointer type mismatch kinda hints that this particular line is fishy at least?
Also the pointer type mismatch kinda hints that this particular line is fishy at least?
Sure, yes.
I've added a fix (#24316) for all the cases suggested by @metagn. @jmgomez Could you please verify? @metagn just to double check, were you able to reproduce the originally reported stack trace?
Langserver seems to have circumvented the original issue in the master branch. Unfortunately, for the commit https://github.com/nim-lang/langserver/commit/883935707f762ce135c27d40b4ef2baf9e85fd86 which I was testing on, #24316 does not fix it in my reproduction, the minimizations above were probably not correct.
Running nimble --nim:... test
in langserver also reproduces the issue in tsuggestapi
, VS Code isn't needed.
It crashes right before this call which was removed in this commit. The full log from langserver in VS Code (before restarts) is:
DBG Starting nimlangserver params="(clientProcessId: none(int), transport: some(stdio), port: 0)"
DBG Starting stdio server
DBG [Processsing Message] request="\"initialize\""
DBG Initialize received...
DBG Registering monitor for process pid=25960
DBG Initialize completed. Trying to start nimsuggest instances
DBG [Processsing Message] request="\"initialized\""
DBG Client initialized.
DBG Requesting configuration from the client
DBG [Processsing Message] request="\"textDocument/didOpen\""
DBG New document opened for URI: uri=file:///test.nim
DBG Auto-guessing project file for file="test.nim"
DBG getProjectFile project="test.nim" fileUri="test.nim"
DBG Document associated with the following projectFile uri=file:///test.nim projectFile="test.nim"
DBG Will create nimsuggest for this file uri=file:///test.nim
DBG ShowMessage message="Using Nim Compiler Version 2.2.0 [Windows: amd64] from nim-2.2.0/bin"
INF Starting nimsuggest root="test.nim" timeout=120000 path="nim-2.2.0/bin/nimsuggest" workingDir="..."
DBG Parsing nimsuggest capability capability=con
DBG Parsing nimsuggest capability capability=exceptionInlayHints
DBG Parsing nimsuggest capability capability=unknownFile
DBG Nimsuggest Capabilities capabilities="{con, exceptionInlayHints, unknownFile}"
out of memory
[Error - 3:52:39 PM] Server process exited with code 1.
[Info - 3:52:39 PM] Connection to server got closed. Server will restart.
true
It seems like neither thread created above it actually starts, and not creating the threads does not cause the issue. Just creating a thread that takes a tuple[]
argument and does nothing also causes the crash.
Ok, finally managed to reproduce, thanks @metagn.
Now, TLDR. This issue should be fixed not by reverting my pr, but in any other suitable way!))
Long description: There are a few problems I'm seeing here.
await
after createThread
. To put it bluntly, the threading API has been misused. Whether this api is good is a question for another issue.The statements above can be easily backed by one of the following experiments:
{.liftLocals.}
pragma, which puts them into the async function env. Note that this pragma should not be used in production, as it is not documented, and it will most likely be removed in the future, but it's perfect for the demo)sleep(100)
after all the createThreads
. It gives the threads enough time to perform the aforementioned deepCopy.Finally I suggest closing this issue for the reason of misused threading api.
Sounds reasonable, in the worst case maybe a changelog entry for the new behavior and disclaimer.
Nim's threading api requires the thread object to exist while the thread is running.
It uses a potentially escaping address for this. If we don't change this it should at least still be documented in createThread
, something like "the address of the thread object should be available throughout the thread's lifetime".
We can track this in this issue too since this is the root problem and the iterator changes were only tangential, but closing it is fine too.
Description
Download and build latest https://github.com/nim-lang/langserver Remove the lock file and the requires from the
nimlangserver.nimble
so it uses the latest Nim to build. Runnimble build
Open a nim project with the vscode extension (make sure the builtnimlangserver
is the one used)It will crash with the following (you can see the output in the
Nim Language Server
tab orNim Lsp
if you are running a newer version)Works fine with
2.0.8
and with2.0.9
Nim Version
Current Output
No response
Expected Output
No response
Known Workarounds
No response
Additional Information
No response