lizmat / App-Rak

21st century grep / find / ack / ag / rg on steroids
Artistic License 2.0
152 stars 7 forks source link

Crash when combining --unique --count-only --matches with large files #57

Open Zer0-Tolerance opened 3 weeks ago

Zer0-Tolerance commented 3 weeks ago

Hi Liz,

Thanks for implementing the feature request ! After doing some testing I've run into a bug when you do this: rak keyword --count-only --unique --matches /tmp/huge-file.json => this crashes instantly rak keyword --count-only --unique --matches /tmp/small-file.txt => this works rak keyword --count-only /tmp/huge-file.json => this works

I'm using rak version

rak --version 
rak - provided by App::Rak 0.3.8, running Raku 6.d with Rakudo 2024.05. 

Any idea why it crashes with huge files only ?

lizmat commented 3 weeks ago

No idea.

How huge is the JSON file?

Could you provide the output with --dont-catch ?

Zer0-Tolerance commented 3 weeks ago

File size is 300 MB

rak -i keyword --count-only --unique --matches /tmp/huge-file.json --dont-catch
MoarVM panic: Internal error: invalid thread ID -412615384 in GC work pass
lizmat commented 3 weeks ago

Ok, looks like a memory corruption in MoarVM :-(. Multi-threading is hard!

Does it also crash with --degree ?

OOC, which version of Rakudo are you using?

Zer0-Tolerance commented 3 weeks ago

Ok, looks like a memory corruption in MoarVM :-(. Multi-threading is hard!

Does it also crash with --degree ? nope it doesn't crash with --degree=1 and crash in 80 % of the cases when --degree=2 so yes very likely a multithreading issue

OOC, which version of Rakudo are you using? Built on MoarVM version 2024.05 running on OSX 13.6.9 Arm

lizmat commented 3 weeks ago

Hmmm --matches appears to be a non-existent argument? Do you have a shortcut for it?

Also: you don't appear to be using any JSON specific functionality. So I tried to do this on a 800MB text file I have for this purpose. It passes, but on MacOS with Apple Silicon.

Are you running by chance on Intel hardware? If so, could you try running it with MVM_EXPR_JIT_DISABLE=1 ?

If that makes a difference, then please upgrade to 2024.07. We identified issues in the expression JIT compiler: 1. it caused instability, and 2. overall, it slowed down execution on Intel hardware. So it has been disabled since the 2024.07 release.

Zer0-Tolerance commented 3 weeks ago

Hmmm --matches appears to be a non-existent argument? Do you have a shortcut for it?

Also: you don't appear to be using any JSON specific functionality. So I tried to do this on a 800MB text file I have for this purpose. It passes, but on MacOS with Apple Silicon.

Are you running by chance on Intel hardware? If so, could you try running it with MVM_EXPR_JIT_DISABLE=1 ?

If that makes a difference, then please upgrade to 2024.07. We identified issues in the expression JIT compiler: 1. it caused instability, and 2. overall, it slowed down execution on Intel hardware. So it has been disabled since the 2024.07 release.

My bad for the arg I have a shortcut for --matches-only and I'm running on Apple Silicon too

lizmat commented 3 weeks ago

Hmm.. then I would suggest you upgrade to 2024.07 anyway: there's been some other work on MoarVM as well, and maybe, just maybe it got fixed.

Zer0-Tolerance commented 3 weeks ago

I will upgrade for sure, but indeed with a different JSON file of 200 mb it seems to work fine. With another file of just 56 MB it keeps crashing.

Zer0-Tolerance commented 3 weeks ago

I'm attaching the small file I use to reproduce the bug xab.gz with : rak -i abc --count-only --unique --matches xab

Zer0-Tolerance commented 3 weeks ago

I've manage to get an error that is a bit more useful:

Unhandled exception in code scheduled on thread 11
This exception is not resumable
  in block  at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/423D5F8D02036D03B5702942C8112298C42E64D2 (ParaSeq) line 796
  in method run at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/423D5F8D02036D03B5702942C8112298C42E64D2 (ParaSeq) line 193
  in block  at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/423D5F8D02036D03B5702942C8112298C42E64D2 (ParaSeq) line 1636
  in block  at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/423D5F8D02036D03B5702942C8112298C42E64D2 (ParaSeq) line 789

fish: Job 1, 'rak -i abc -u -c -m /Users/user…' terminated by signal SIGSEGV (Address boundary error)

There must be a link with paraseq !

Zer0-Tolerance commented 3 weeks ago

if you run this cmd several time you will get different number of matches:

rak -i abc --count-only --matches-only xab
1082 matches in 1 file
rak -i abc --count-only --matches-only xab
1084 matches in 1 file
rak -i abc --count-only --matches-only xab
1085 matches in 1 file

but if you add --degree=1 then you always get 1084. There is something wrong with multithreading

lizmat commented 3 weeks ago

There is something wrong with multithreading

Interesting. The count-only feature uses the :mapper: feature of the "rak" module, and that's supposed to be thread-safe...

lizmat commented 3 weeks ago

There must be a link with paraseq !

Indeed. But ParaSeq just uses Raku's features, albeit in a more optimal way than the standard .hyper functionality. So maybe it's tickling some race condition that otherwise doesn't get tickled.

lizmat commented 3 weeks ago

rak -i abc --count-only --matches-only xab

OOC, why the --matches-only ? Especially if you have a fixed string?

lizmat commented 3 weeks ago

This exception is not resumable

Sadly this doesn't tell it what the exception was. This happens on .resume :-(

lizmat commented 3 weeks ago

I've just uploaded a 0.3.11 release of App::Rak that shouldn't die on trying to resume anymore. Hopefully the next crash with this release will be able to tell us more!

Zer0-Tolerance commented 3 weeks ago

rak -i abc --count-only --matches-only xab

OOC, why the --matches-only ? Especially if you have a fixed string?

this gives me all unique matched pattern with different case which is useful for me.

Zer0-Tolerance commented 3 weeks ago

I've just uploaded a 0.3.11 release of App::Rak that shouldn't die on trying to resume anymore. Hopefully the next crash with this release will be able to tell us more!

managed to get this:

1084 matches in 1 file
Caught 2 unique exceptions (out of 731) in hypered code:
--------------------------------------------------------------------------------
678x: No such method 'slip-all' for invocant of type 'NQPArray'
  in any throw_or_die at /Users/user/.rakubrew/versions/moar-2024.07/bin/../share/perl6/lib/Perl6/Metamodel.moarvm line 1
  in method run at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 201
  in block  at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 1636
  in block  at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 789
--------------------------------------------------------------------------------
53x: No such method 'slip-all' for invocant of type 'NQPArray'
  in method run at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 201
  in block  at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 1636
  in block  at /Users/user/.rakubrew/versions/moar-2024.07/share/perl6/site/sources/2996E2B3FC3B839A94F76A5A9D9F3C81A040C08C (ParaSeq) line 789
--------------------------------------------------------------------------------

useful ?

lizmat commented 3 weeks ago

VERY useful. But first some sleep :-)

lizmat commented 3 weeks ago

An update from my side: started working on golfing the issue, not a lot of luck so far.

Zer0-Tolerance commented 3 weeks ago

ok thanks , let me know if you need anything else.

lizmat commented 3 weeks ago

I've just uploaded 0.3.13 which should hopefully fix your query, albeit at the expense of being less asynchronous.

I don't fully understand yet how this happens, but this should be a lot stabler. Please let me know if this didn't fix it.

Also, today I uncovered an issue in Rakudo with the use of / foo /, aka running a regex on the topic in a multi-threaded manner. This could lead to strange disatch errors. Submodules of App::Rakhave been updated to make sure that that cannot happen anymore.

Zer0-Tolerance commented 3 weeks ago

Good news is latest version has fixed the issue on both of my test files but it's quite slow. I'm using a lot of regex in multithreaded context so I'm keen to know more about this issue. Will this be fix in raku ?

lizmat commented 2 weeks ago

Doubtful any time soon. It's not really a technical issue.

The workaround is to make sure there is an unshared lexical $/ within lexical scope of the code being multithreaded.

Ideally we would like to have each scope have its own lexical $/, as that would fix the problem. However, that would break code such as:

if /foo/ { say $/ } # or $0 or $<bar>

which is currently, apart from being a common pattern, is also cemented in roast.

lizmat commented 2 weeks ago

Status update: I think I've been able to reproduce your original problem with a 10 line script. So there's progress :-)

Zer0-Tolerance commented 2 weeks ago

Glad to hear that.

Zer0-Tolerance commented 2 weeks ago

Doubtful any time soon. It's not really a technical issue.

The workaround is to make sure there is an unshared lexical $/ within lexical scope of the code being multithreaded.

Ideally we would like to have each scope have its own lexical $/, as that would fix the problem. However, that would break code such as:

if /foo/ { say $/ } # or $0 or $<bar>

which is currently, apart from being a common pattern, is also cemented in roast.

Just to be clear in order to prevent regex multithreading issues, you just need to add my $/; to your code block being hyperized ?

lizmat commented 2 weeks ago

Yes. / foo / sets the nearest visible $/. If you put a my $/ inside the code block being multithreaded, you're safe. If you have the most recent App::Rak, then it will do that automatically for you in any code block.