Open amesgen opened 1 year ago
Thanks for the report.
I have no intuition about what the issue is. Maybe dig into uses of Text.fromPtr
, Text.copy
, Text.dropWhile8
, Text.useAsPtr
, etc.
That's great you got use out of the library for Ormolu, at least for a little while! ❤️
https://github.com/haskell/text/blob/2.0.1/src/Data/Text/Array.hs#L75
Just a remark: It's funny that Data.Text.Array.new
takes a signed Int
for size, and calls it "overflow" when it's negative.
Can the bug be reproduced using valgrind
?
This happens both with hs-pcre2 2.1.1.1 and 2.2.1 (so text-1.2 and text-2.0), but the error message for the latter goes like this
The way to debug it is to build text
with developer
flag enabled. In order to force Cabal into doing so, pin a version of text
which mismatches a boot package for GHC in use.
Also pass --ghc-options='-fcheck-prim-bounds -fno-ignore-asserts'
.
Also you can fork text
locally and start adding HasCallStack
constraints to track the provenance of the error.
I'm curious if the error is still reproducible.
I had another look, and I can still reproduce this (as described in the issue description; but I think it can be minimized somewhat, at the very least getting rid of the huge ghc-lib-parser
dependency).
Building with profiling and +RTS -xc
for stack traces as well as enabling the developer
flag as suggested by @Bodigrim pointed to the following issue:
Stack trace:
*** Exception (reporting due to +RTS -xc): (THUNK_2_0), stack trace:
Data.Text.Array.CAF
--> evaluated by: Text.Regex.Pcre2.Internal.getErrorMessage,
called from Text.Regex.Pcre2.Internal.check,
called from Text.Regex.Pcre2.Internal.fix1,
called from Text.Regex.Pcre2.Internal.matcherWithEnv,
called from Text.Regex.Pcre2.Internal.pureUserMatcher,
called from Text.Regex.Pcre2.TH.memoMatcher,
called from Text.Regex.Pcre2.Internal.mapMS,
called from Text.Regex.Pcre2.Internal._gcaptures,
called from Text.Regex.Pcre2.TH._capturesTH,
called from Text.Regex.Pcre2.Internal.toAlternativeOf,
called from Text.Regex.Pcre2.TH.capturesTH,
called from Main.extractFixitiesFromFile,
called from Main.extractHoogleInfo,
called from Main.main
With a few more manually inserted putStrLn
s, it turns out that this is the offending line:
https://github.com/sjshuck/hs-pcre2/blob/2d01498056569905abc643fdffa697efbbd6c08c/src/hs/Text/Regex/Pcre2/Internal.hs#L1257
Here, cus
is -29
, which corresponds to PCRE2_ERROR_BADDATA
, which is returned by pcre2_get_error_message
here https://github.com/sjshuck/hs-pcre2/blob/2d01498056569905abc643fdffa697efbbd6c08c/src/hs/Text/Regex/Pcre2/Internal.hs#L1256
because of this:
If errorcode does not contain a recognized error code number, the negative value
PCRE2_ERROR_BADDATA
is returned.
This in turn is because in matcherWithEnv
here
https://github.com/sjshuck/hs-pcre2/blob/2d01498056569905abc643fdffa697efbbd6c08c/src/hs/Text/Regex/Pcre2/Internal.hs#L719
we check whether result
is positive, and throw an exception (which prints the error message) otherwise. Now the question is why result
is non-positive. In fact, it is equal to 0
, as a result of a call to pcre2_match
:
https://github.com/sjshuck/hs-pcre2/blob/2d01498056569905abc643fdffa697efbbd6c08c/src/hs/Text/Regex/Pcre2/Internal.hs#L705-L713
From the docs (emphasis mine):
The return from
pcre2_match()
is one more than the highest numbered capturing pair that has been set (for example, 1 if there are no captures), zero if the vector of offsets is too small, or a negative error code for no match and other errors.
This is where I am stopping for now, as I am not at all familiar with the PCRE2 C API.
The pattern in question is
(?m)^\\s*?(?<fixDir>infix[rl]?)\\s+?(?<fixPrec>[0-9])\\s+?(?<infixOpNames>(?:[^,\\s]+\\s*?,\\s*?)*?[^,\\s]+)\\s*$
in case that is useful; I have tried to produce a minimal example of matching this pattern against the subject text, but I couldn't reproduce the error there.
BTW:
In order to force Cabal into doing so, pin a version of
text
which mismatches a boot package for GHC in use.
A less hacky way to accomplish this is to use the following in your cabal.project
:
constraints: text source
Could it be a pcre2 bug then? Looking at https://www.pcre.org/current/doc/html/pcre2demo.html, it seems we are doing everything by the book: allocate offset vector with pcre2_match_data_create_from_pattern
, then pass it to pcre2_match
.
I updated the PCRE2 submodule from 10.42 to 10.44 just now. Wonder if that makes a difference. EDIT Not with running benchmarks, it doesn't. Still failing.
Hmm, when I build with
constraints: text source
package text
flags: +developer
and hs-pcre2 2.1.1 from Hackage, I get
extract-hackage-info: Data.Text.Array.new: size overflow
CallStack (from HasCallStack):
error, called at src/Data/Text/Array.hs:74:19 in text-2.0.2-f5aebd26e19e5a1c2907d840502e1f413aa9a5057431546d415964b124f1ffd2:Data.Text.Array
If I instead build from recent master
commits (tried 2d01498056569905abc643fdffa697efbbd6c08c and 8ac889a7e0e26564f663f1edd2bffce9bbcb6f65, so with PCRE2 10.44 and 10.42), I instead get
malloc(): unsorted double linked list corrupted
fish: Job 1, 'cabal run extract-hackage-info …' terminated by signal SIGABRT (Abort)
(seems similar to https://github.com/sjshuck/hs-pcre2/issues/35#issuecomment-1666963101) or sometimes also
malloc(): mismatching next->prev_size (unsorted)
Maybe the investigation I posted above is a red herring, and there is some nasty undefined behavior going on in pcre2 :thinking:
https://stackoverflow.com/a/28762523 suggests that both malloc()
failures mean memory corruption.
Sorry for this report being very non-minimally reproducible, please don't feel obligated to act on this in any way.
We are using hs-pcre2 to extract some information from the Hackage Hoogle database (500MB, 70k files): relevant function, and see this script for how to execute the app (this is Nix-based and hence fully reproducible, but it should be rather easily adaptable to not rely on Nix).
At some point (it still worked in May), the app started to crash after processing a few thousand files:
This happens both with hs-pcre2 2.1.1.1 and 2.2.1 (so text-1.2 and text-2.0), but the error message for the latter goes like this:
I replaced hs-pcre2 by megaparsec in https://github.com/tweag/ormolu/pull/958, which makes the error go away, so we are not blocked on this and I didn't investigate further what exactly is going on with hs-pcre2. But maybe this is still interesting to you or is ringing some bell.