Open Zer0-Tolerance opened 3 weeks ago
No idea.
How huge is the JSON file?
Could you provide the output with --dont-catch
?
File size is 300 MB
rak -i keyword --count-only --unique --matches /tmp/huge-file.json --dont-catch
MoarVM panic: Internal error: invalid thread ID -412615384 in GC work pass
Ok, looks like a memory corruption in MoarVM :-(. Multi-threading is hard!
Does it also crash with --degree
?
OOC, which version of Rakudo are you using?
Ok, looks like a memory corruption in MoarVM :-(. Multi-threading is hard!
Does it also crash with
--degree
? nope it doesn't crash with --degree=1 and crash in 80 % of the cases when --degree=2 so yes very likely a multithreading issueOOC, which version of Rakudo are you using? Built on MoarVM version 2024.05 running on OSX 13.6.9 Arm
Hmmm --matches
appears to be a non-existent argument? Do you have a shortcut for it?
Also: you don't appear to be using any JSON specific functionality. So I tried to do this on a 800MB text file I have for this purpose. It passes, but on MacOS with Apple Silicon.
Are you running by chance on Intel hardware? If so, could you try running it with MVM_EXPR_JIT_DISABLE=1
?
If that makes a difference, then please upgrade to 2024.07. We identified issues in the expression JIT compiler: 1. it caused instability, and 2. overall, it slowed down execution on Intel hardware. So it has been disabled since the 2024.07 release.
Hmmm
--matches
appears to be a non-existent argument? Do you have a shortcut for it?Also: you don't appear to be using any JSON specific functionality. So I tried to do this on a 800MB text file I have for this purpose. It passes, but on MacOS with Apple Silicon.
Are you running by chance on Intel hardware? If so, could you try running it with
MVM_EXPR_JIT_DISABLE=1
?If that makes a difference, then please upgrade to 2024.07. We identified issues in the expression JIT compiler: 1. it caused instability, and 2. overall, it slowed down execution on Intel hardware. So it has been disabled since the 2024.07 release.
My bad for the arg I have a shortcut for --matches-only
and I'm running on Apple Silicon too
Hmm.. then I would suggest you upgrade to 2024.07 anyway: there's been some other work on MoarVM as well, and maybe, just maybe it got fixed.
I will upgrade for sure, but indeed with a different JSON file of 200 mb it seems to work fine. With another file of just 56 MB it keeps crashing.
I'm attaching the small file I use to reproduce the bug
xab.gz
with : rak -i abc --count-only --unique --matches xab
I've manage to get an error that is a bit more useful:
Unhandled exception in code scheduled on thread 11
This exception is not resumable
in block at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/423D5F8D02036D03B5702942C8112298C42E64D2 (ParaSeq) line 796
in method run at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/423D5F8D02036D03B5702942C8112298C42E64D2 (ParaSeq) line 193
in block at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/423D5F8D02036D03B5702942C8112298C42E64D2 (ParaSeq) line 1636
in block at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/423D5F8D02036D03B5702942C8112298C42E64D2 (ParaSeq) line 789
fish: Job 1, 'rak -i abc -u -c -m /Users/user…' terminated by signal SIGSEGV (Address boundary error)
There must be a link with paraseq !
if you run this cmd several time you will get different number of matches:
rak -i abc --count-only --matches-only xab
1082 matches in 1 file
rak -i abc --count-only --matches-only xab
1084 matches in 1 file
rak -i abc --count-only --matches-only xab
1085 matches in 1 file
but if you add --degree=1
then you always get 1084
. There is something wrong with multithreading
There is something wrong with multithreading
Interesting. The count-only
feature uses the :mapper: feature of the "rak" module, and that's supposed to be thread-safe...
There must be a link with paraseq !
Indeed. But ParaSeq
just uses Raku's features, albeit in a more optimal way than the standard .hyper
functionality. So maybe it's tickling some race condition that otherwise doesn't get tickled.
rak -i abc --count-only --matches-only xab
OOC, why the --matches-only
? Especially if you have a fixed string?
This exception is not resumable
Sadly this doesn't tell it what the exception was. This happens on .resume
:-(
I've just uploaded a 0.3.11 release of App::Rak
that shouldn't die on trying to resume anymore. Hopefully the next crash with this release will be able to tell us more!
rak -i abc --count-only --matches-only xab
OOC, why the
--matches-only
? Especially if you have a fixed string?
this gives me all unique matched pattern with different case which is useful for me.
I've just uploaded a 0.3.11 release of
App::Rak
that shouldn't die on trying to resume anymore. Hopefully the next crash with this release will be able to tell us more!
managed to get this:
1084 matches in 1 file
Caught 2 unique exceptions (out of 731) in hypered code:
--------------------------------------------------------------------------------
678x: No such method 'slip-all' for invocant of type 'NQPArray'
in any throw_or_die at /Users/user/.rakubrew/versions/moar-2024.07/bin/../share/perl6/lib/Perl6/Metamodel.moarvm line 1
in method run at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 201
in block at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 1636
in block at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 789
--------------------------------------------------------------------------------
53x: No such method 'slip-all' for invocant of type 'NQPArray'
in method run at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 201
in block at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 1636
in block at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 789
--------------------------------------------------------------------------------
useful ?
VERY useful. But first some sleep :-)
An update from my side: started working on golfing the issue, not a lot of luck so far.
ok thanks , let me know if you need anything else.
I've just uploaded 0.3.13 which should hopefully fix your query, albeit at the expense of being less asynchronous.
I don't fully understand yet how this happens, but this should be a lot stabler. Please let me know if this didn't fix it.
Also, today I uncovered an issue in Rakudo with the use of / foo /
, aka running a regex on the topic in a multi-threaded manner. This could lead to strange disatch errors. Submodules of App::Rak
have been updated to make sure that that cannot happen anymore.
Good news is latest version has fixed the issue on both of my test files but it's quite slow. I'm using a lot of regex in multithreaded context so I'm keen to know more about this issue. Will this be fix in raku ?
Doubtful any time soon. It's not really a technical issue.
The workaround is to make sure there is an unshared lexical $/
within lexical scope of the code being multithreaded.
Ideally we would like to have each scope have its own lexical $/
, as that would fix the problem. However, that would break code such as:
if /foo/ { say $/ } # or $0 or $<bar>
which is currently, apart from being a common pattern, is also cemented in roast.
Status update: I think I've been able to reproduce your original problem with a 10 line script. So there's progress :-)
Glad to hear that.
Doubtful any time soon. It's not really a technical issue.
The workaround is to make sure there is an unshared lexical
$/
within lexical scope of the code being multithreaded.Ideally we would like to have each scope have its own lexical
$/
, as that would fix the problem. However, that would break code such as:if /foo/ { say $/ } # or $0 or $<bar>
which is currently, apart from being a common pattern, is also cemented in roast.
Just to be clear in order to prevent regex multithreading issues, you just need to add my $/;
to your code block being hyperized ?
Yes. / foo /
sets the nearest visible $/
. If you put a my $/
inside the code block being multithreaded, you're safe. If you have the most recent App::Rak
, then it will do that automatically for you in any code block.
Hi Liz,
Thanks for implementing the feature request ! After doing some testing I've run into a bug when you do this:
rak keyword --count-only --unique --matches /tmp/huge-file.json
=> this crashes instantlyrak keyword --count-only --unique --matches /tmp/small-file.txt
=> this worksrak keyword --count-only /tmp/huge-file.json
=> this worksI'm using rak version
Any idea why it crashes with huge files only ?