nicm / fdm

fdm source code
269 stars 50 forks source link

Crash in imap_commit() #122

Closed Freaky closed 1 year ago

Freaky commented 1 year ago

Backtrace from fdm 2.2:

[New LWP 1099199]
Core was generated by `fdm: child: eda'.
Program terminated with signal SIGBUS, Bus error.
Object-specific hardware error.
#0  0x000000000022e387 in imap_commit (a=<optimized out>, m=0x800e4e000) at imap-common.c:275
warning: Source file is more recent than executable.
275                     ARRAY_ADD(&data->kept, aux->uid);
(gdb) bt
#0  0x000000000022e387 in imap_commit (a=<optimized out>, m=0x800e4e000) at imap-common.c:275
#1  0x000000000021fc12 in fetch_dequeue (a=a@entry=0x800e971e0, mctx=mctx@entry=0x800f4cf80 at child-fetch.c:725
#2  0x000000000021fb41 in fetch_match (a=0x28, a@entry=0x800e971e0, msg=<optimized out>, msg@entry=0x0,
msgbuf=msgbuf@entry=0x7fffffffcde8) at child-fetch.c:191
#3  0x000000000021f414 in fetch_account (a=a@entry=0x800e971e0, pio=pio@entry=0x800e2b220, nflags=0,
tim=tim@entry=1674158697.540386) at child-fetch.c:386
#4  0x000000000021f074 in child_fetch (child=<optimized out>, pio=0x800e2b220) at child-fetch.c:114
#5  0x0000000000220b75 in child_start (children=children@entry=0x7fffffffe208, uid=0, gid=0,
start=0x21efd0 <child_fetch>, msg=<optimized out>, data=<optimized out>, parent=0x0) at child.c:133
#6  0x000000000022834f in main (argc=<optimized out>, argv=<optimized out>) at fdm.c:773

Just built a copy with -O0 to try to maximise debug info and it helpfully just fetched everything and stopped me being able to reproduce this, but I still have the core if you have suggestions about how else to poke it.

nicm commented 1 year ago

Can you do p data->kept and p aux-uid?

Freaky commented 1 year ago
(gdb) p data->kept
$1 = {list = 0x0, num = 0, space = 0}
(gdb) p aux->uid
Cannot access memory at address 0x6f63736943232020

That's an interesting pointer -- it corresponds to the string #Cisco.

Freaky commented 1 year ago

Doesn't appear to be from any email I've received, but it does appear five times in /etc/services.

nicm commented 1 year ago

Can you show me the corresponding -vvvv output?

Freaky commented 1 year ago

Unfortunately not. I wish I'd done that before I'd blown away my ability to reproduce it.

I can at least identify the message it was processing (indeed, had successfully delivered 175 times) when it crashed: a FreeBSD commit message. Looks unremarkable and bouncing it back to myself doesn't cause any issue.

nicm commented 1 year ago

OK well if you can't reproduce now there is probably not a lot more we can do.

Freaky commented 1 year ago
.rw-r--r-- 43k freaky freaky 24 Jan 22:14 fdm-crash.stderr
.rw-r--r-- 74k freaky freaky 24 Jan 22:14 fdm-crash.stdout
.rw------- 14M freaky freaky 24 Jan 22:14 fdm.core

Identical pointer on aux.

stderr: https://gist.github.com/Freaky/4351d4996b80c993fcda4cbc313eca38
stdout: https://gist.github.com/Freaky/627338733e665ad721694b9bc0127045

I scrubbed my password, don't think there's much else sensitive in there.

Freaky commented 1 year ago

Rubbing valgrind on it finds this shortly before the crash:

eda: trying (match) message 86                                                                                                                                                                                       eda: matching from 0 to 6379 (size=7657, body=6379)
==15056== Invalid write of size 4
==15056==    at 0x23895F: re_block (pcre.c:101)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
:                                                                                                                                                                                                                    eda: trying (match) message 86                                                                                                                                                                                       eda: matching from 0 to 6379 (size=7657, body=6379)
eda: tried regexp "^X-Cron-Env: ." in headers, result now 0
...
eda: trying (match) message 86
eda: matching from 0 to 6379 (size=7657, body=6379)
==15056== Invalid write of size 4
==15056==    at 0x23895F: re_block (pcre.c:101)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==  Address 0x671a420 is 0 bytes after a block of size 672 alloc'd
==15056==    at 0x48500D5: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==15056==    by 0x23D25A: xcalloc (xmalloc.c:93)
==15056==    by 0x21F5BD: fetch_account (child-fetch.c:437)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==
==15056== Invalid write of size 8
==15056==    at 0x23896A: re_block (pcre.c:102)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==  Address 0x671a428 is 8 bytes after a block of size 672 alloc'd
==15056==    at 0x48500D5: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==15056==    by 0x23D25A: xcalloc (xmalloc.c:93)
==15056==    by 0x21F5BD: fetch_account (child-fetch.c:437)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==
==15056== Invalid write of size 8
==15056==    at 0x238972: re_block (pcre.c:103)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==  Address 0x671a430 is 16 bytes after a block of size 672 alloc'd
==15056==    at 0x48500D5: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==15056==    by 0x23D25A: xcalloc (xmalloc.c:93)
==15056==    by 0x21F5BD: fetch_account (child-fetch.c:437)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==
eda: tried regexp "^Mailing-list: list ([a-z0-9_-]+)@googlegroups" in headers, result now 0

The pid matches:

==15056== Invalid read of size 4
==15056==    at 0x22E28C: imap_commit (imap-common.c:273)
==15056==    by 0x21FC11: fetch_dequeue (child-fetch.c:725)
==15056==    by 0x21FB40: fetch_match (child-fetch.c:191)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==  Address 0x1f2 is not stack'd, malloc'd or (recently) free'd
==15056==
==15056==
==15056== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==15056==  Access not within mapped region at address 0x1F2

Earlier on the same pattern triggers invalid reads:

eda: trying (match) message 61
eda: matching from 0 to 6384 (size=8128, body=6384)
==15056== Invalid read of size 8
==15056==    at 0x238953: re_block (pcre.c:99)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==  Address 0x5a2a938 is 8 bytes after a block of size 112 alloc'd
==15056==    at 0x484CBC4: malloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==15056==    by 0x4BF1DB3: ??? (in /usr/local/lib/libpcre2-8.so.0.11.2)
==15056==    by 0x4C5BBB0: pcre2_match_data_create_from_pattern_8 (in /usr/local/lib/libpcre2-8.so.0.11.2)
==15056==    by 0x2388D5: re_block (pcre.c:87)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==
==15056== Invalid read of size 8
==15056==    at 0x238959: re_block (pcre.c:99)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==  Address 0x5a2a930 is 0 bytes after a block of size 112 alloc'd
==15056==    at 0x484CBC4: malloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==15056==    by 0x4BF1DB3: ??? (in /usr/local/lib/libpcre2-8.so.0.11.2)
==15056==    by 0x4C5BBB0: pcre2_match_data_create_from_pattern_8 (in /usr/local/lib/libpcre2-8.so.0.11.2)
==15056==    by 0x2388D5: re_block (pcre.c:87)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==
==15056== Invalid read of size 8
==15056==    at 0x238966: re_block (pcre.c:102)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==  Address 0x5a2a930 is 0 bytes after a block of size 112 alloc'd
==15056==    at 0x484CBC4: malloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==15056==    by 0x4BF1DB3: ??? (in /usr/local/lib/libpcre2-8.so.0.11.2)
==15056==    by 0x4C5BBB0: pcre2_match_data_create_from_pattern_8 (in /usr/local/lib/libpcre2-8.so.0.11.2)
==15056==    by 0x2388D5: re_block (pcre.c:87)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==
==15056== Invalid read of size 8
==15056==    at 0x23896E: re_block (pcre.c:103)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==  Address 0x5a2a938 is 8 bytes after a block of size 112 alloc'd
==15056==    at 0x484CBC4: malloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==15056==    by 0x4BF1DB3: ??? (in /usr/local/lib/libpcre2-8.so.0.11.2)
==15056==    by 0x4C5BBB0: pcre2_match_data_create_from_pattern_8 (in /usr/local/lib/libpcre2-8.so.0.11.2)
==15056==    by 0x2388D5: re_block (pcre.c:87)
==15056==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==15056==    by 0x232C85: mail_match (mail-state.c:149)
==15056==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==15056==    by 0x21F413: fetch_account (child-fetch.c:386)
==15056==    by 0x21F073: child_fetch (child-fetch.c:114)
==15056==    by 0x220B74: child_start (child.c:133)
==15056==    by 0x22834E: main (fdm.c:773)
==15056==
eda: tried regexp "^X-Cron-Env: ." in headers, result now 0
nicm commented 1 year ago

Does this fix it?


--- a/pcre.c
+++ b/pcre.c
@@ -93,6 +93,8 @@ re_block(struct re *re, const void *buf, size_t len, struct rmlist *rml,
        }

        if (rml != NULL) {
+               if (res > NPMATCH)
+                       res = NPMATCH;
                ovector = pcre2_get_ovector_pointer(pmd);
                for (i = 0; i < res; i++) {
                        j = i * 2;
Freaky commented 1 year ago

Seems to have fixed the crash. I'll keep an eye on it.

I did note this, though it also appears in the unpatched run:

eda: matching from 0 to 3385 (size=3548, body=3385)
==3299== Conditional jump or move depends on uninitialised value(s)
==3299==    at 0x23896D: re_block (pcre.c:101)
==3299==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==3299==    by 0x232C85: mail_match (mail-state.c:149)
==3299==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==3299==    by 0x21F413: fetch_account (child-fetch.c:386)
==3299==    by 0x21F073: child_fetch (child-fetch.c:114)
==3299==    by 0x220B74: child_start (child.c:133)
==3299==    by 0x22834E: main (fdm.c:773)
==3299==
eda: tried regexp "^X-Cron-Env: ." in headers, result now 0

Line 101 on the patched file is this conditional from inside the loop over res:

if (ovector[j + 1] <= ovector[j])
Freaky commented 1 year ago

Same crash dereferencing the same pointer (remarkably consistent), and similar valgrind errors on a different regexp:

eda: matching from 0 to 2286 (size=77546, body=2286)
==12129== Invalid write of size 4
==12129==    at 0x23896F: re_block (pcre.c:103)
==12129==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==12129==    by 0x232C85: mail_match (mail-state.c:149)
==12129==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==12129==    by 0x21F413: fetch_account (child-fetch.c:386)
==12129==    by 0x21F073: child_fetch (child-fetch.c:114)
==12129==    by 0x220B74: child_start (child.c:133)
==12129==    by 0x22834E: main (fdm.c:773)
==12129==  Address 0x6084580 is 0 bytes after a block of size 672 alloc'd
==12129==    at 0x48500D5: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==12129==    by 0x23D26A: xcalloc (xmalloc.c:93)
==12129==    by 0x21F5BD: fetch_account (child-fetch.c:437)
==12129==    by 0x21F073: child_fetch (child-fetch.c:114)
==12129==    by 0x220B74: child_start (child.c:133)
==12129==    by 0x22834E: main (fdm.c:773)
==12129==
==12129== Invalid write of size 8
==12129==    at 0x23897A: re_block (pcre.c:104)
==12129==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==12129==    by 0x232C85: mail_match (mail-state.c:149)
==12129==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==12129==    by 0x21F413: fetch_account (child-fetch.c:386)
==12129==    by 0x21F073: child_fetch (child-fetch.c:114)
==12129==    by 0x220B74: child_start (child.c:133)
==12129==    by 0x22834E: main (fdm.c:773)
==12129==  Address 0x6084588 is 8 bytes after a block of size 672 alloc'd
==12129==    at 0x48500D5: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==12129==    by 0x23D26A: xcalloc (xmalloc.c:93)
==12129==    by 0x21F5BD: fetch_account (child-fetch.c:437)
==12129==    by 0x21F073: child_fetch (child-fetch.c:114)
==12129==    by 0x220B74: child_start (child.c:133)
==12129==    by 0x22834E: main (fdm.c:773)
==12129==
==12129== Invalid write of size 8
==12129==    at 0x238982: re_block (pcre.c:105)
==12129==    by 0x235E9A: match_regexp_match (match-regexp.c:62)
==12129==    by 0x232C85: mail_match (mail-state.c:149)
==12129==    by 0x21FAD0: fetch_match (child-fetch.c:177)
==12129==    by 0x21F413: fetch_account (child-fetch.c:386)
==12129==    by 0x21F073: child_fetch (child-fetch.c:114)
==12129==    by 0x220B74: child_start (child.c:133)
==12129==    by 0x22834E: main (fdm.c:773)
==12129==  Address 0x6084590 is 16 bytes after a block of size 672 alloc'd
==12129==    at 0x48500D5: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==12129==    by 0x23D26A: xcalloc (xmalloc.c:93)
==12129==    by 0x21F5BD: fetch_account (child-fetch.c:437)
==12129==    by 0x21F073: child_fetch (child-fetch.c:114)
==12129==    by 0x220B74: child_start (child.c:133)
==12129==    by 0x22834E: main (fdm.c:773)
==12129==
eda: tried regexp "^List-Id:.*(css-d|org-[a-z]+|notmuch|whatwg|www-(?:html|style)|ruby-talk)" in headers, result now 0
nicm commented 1 year ago

Are you sure you have the fix applied for that test?

Freaky commented 1 year ago

I had the same thought and double-checked. I added the patch to my port tree, verified it applied correctly, and force-rebuilt and reinstalled the package. I made sure there's no other fdm in my PATH.

Valgrind's reported line numbers also match the patched file:

103                         rml->list[i].valid = 1;
104                         rml->list[i].so = ovector[j];
105                         rml->list[i].eo = ovector[j + 1];

The loop conditional has me a little suspicious:

 99                 for (i = 0; i < res; i++) {

And indeed it triggers a warning when compiled with debug mode:

pcre.c:99:17: warning: comparison of integers of different signs: 'u_int' (aka 'unsigned int') and 'int' [-Wsign-compare]
                for (i = 0; i < res; i++) {

Doesn't C implicitly convert the signed side to unsigned in this case? And -1 is explicitly allowed in the prior conditional, which will cast to MAX_INT:

 89         if (res < 0 && res != PCRE2_ERROR_NOMATCH) {
Freaky commented 1 year ago

I'm a bit confused by the code flow here - why is PCRE2_ERROR_NOMATCH special-cased to run the res loop and skip pcre2_match_data_free? Shouldn't it free and return 0 instead of falling through?

nicm commented 1 year ago

res cannot be < 0 because if would have returned on line 92.

I expect we should not run the loop if res is PCRE2_ERROR_NOMATCH. What is PCRE2_ERROR_NOMATCH defined to by PCRE?

nicm commented 1 year ago

OK PCRE2_ERROR_NOMATCH is -1. I would probably just check for that explicitly after the res < 0 check and ditch the ugly return !=.

Freaky commented 1 year ago

The res = 0 case also looks like an error condition - the list of offsets was too small, which pcre2_match_data_create_from_pattern should avoid?

Also there's a missing null pointer check on that.

Freaky commented 1 year ago

How about something like:

--- pcre.c.orig 2023-01-09 11:20:17 UTC
+++ pcre.c
@@ -66,7 +66,7 @@ int
 re_block(struct re *re, const void *buf, size_t len, struct rmlist *rml,
     char **cause)
 {
-   int          res;
+   int          res, ret;
    pcre2_match_data    *pmd;
    PCRE2_SIZE      *ovector;
    u_int            i, j;
@@ -85,27 +85,36 @@ re_block(struct re *re, const void *buf, size_t len, s
    }

    pmd = pcre2_match_data_create_from_pattern(re->pcre2, NULL);
+   if (pmd == NULL)
+       fatalx("pcre2_match_data_create_from_pattern failed");
+
    res = pcre2_match(re->pcre2, buf, len, 0, 0, pmd, NULL);
-   if (res < 0 && res != PCRE2_ERROR_NOMATCH) {
-       xasprintf(cause, "%s: regexec failed", re->str);
-       pcre2_match_data_free(pmd);
-       return (-1);
-   }

-   if (rml != NULL) {
-       ovector = pcre2_get_ovector_pointer(pmd);
-       for (i = 0; i < res; i++) {
-           j = i * 2;
-           if (ovector[j + 1] <= ovector[j])
-               break;
-           rml->list[i].valid = 1;
-           rml->list[i].so = ovector[j];
-           rml->list[i].eo = ovector[j + 1];
+   if (res > 0) {
+       if (res > NPMATCH)
+           res = NPMATCH;
+       if (rml != NULL) {
+           ovector = pcre2_get_ovector_pointer(pmd);
+           for (i = 0; i < res; i++) {
+               j = i * 2;
+               if (ovector[j + 1] <= ovector[j])
+                   break;
+               rml->list[i].valid = 1;
+               rml->list[i].so = ovector[j];
+               rml->list[i].eo = ovector[j + 1];
+           }
+           rml->valid = 1;
        }
-       rml->valid = 1;
+       ret = 1;
+   } else if (res == PCRE2_ERROR_NOMATCH) {
+       ret = 0;
+   } else {
+       xasprintf(cause, "%s: regexec failed", re->str);
+       ret = -1;
    }

-   return (res != PCRE2_ERROR_NOMATCH);
+   pcre2_match_data_free(pmd);
+   return (ret);
 }

 void

I remain baffled at if (ovector[j + 1] <= ovector[j]). Stop if the next offset is smaller than the current one? Surely the loop conditional is enough - or would be if it was defended against the -1 case.

nicm commented 1 year ago

This looks ok although I'd prefer to just return early rather than adding ret, although there isn't much in it.

I'll be back home on Wed and I will look at this again and apply it then (assuming it fixes your problem).

On Sat, 28 Jan 2023, 13:35 Thomas Hurst, @.***> wrote:

How about something like:

--- pcre.c.orig 2023-01-09 11:20:17 UTC +++ pcre.c @@ -66,7 +66,7 @@ int re_block(struct re re, const void buf, size_t len, struct rmlist *rml, char **cause) {

  • int res;

  • int res, ret; pcre2_match_data pmd; PCRE2_SIZE ovector; u_int i, j; @@ -85,27 +85,36 @@ re_block(struct re re, const void buf, size_t len, s }

    pmd = pcre2_match_data_create_from_pattern(re->pcre2, NULL);

  • if (pmd == NULL)

  • fatalx("pcre2_match_data_create_from_pattern failed");

  • res = pcre2_match(re->pcre2, buf, len, 0, 0, pmd, NULL);

  • if (res < 0 && res != PCRE2_ERROR_NOMATCH) {

  • xasprintf(cause, "%s: regexec failed", re->str);

  • pcre2_match_data_free(pmd);

  • return (-1);

  • }

  • if (rml != NULL) {

  • ovector = pcre2_get_ovector_pointer(pmd);

  • for (i = 0; i < res; i++) {

  • j = i * 2;

  • if (ovector[j + 1] <= ovector[j])

  • break;

  • rml->list[i].valid = 1;

  • rml->list[i].so = ovector[j];

  • rml->list[i].eo = ovector[j + 1];

  • if (res > 0) {

  • if (res > NPMATCH)

  • res = NPMATCH;

  • if (rml != NULL) {

  • ovector = pcre2_get_ovector_pointer(pmd);

  • for (i = 0; i < res; i++) {

  • j = i * 2;

  • if (ovector[j + 1] <= ovector[j])

  • break;

  • rml->list[i].valid = 1;

  • rml->list[i].so = ovector[j];

  • rml->list[i].eo = ovector[j + 1];

  • }

  • rml->valid = 1; }

  • rml->valid = 1;

  • ret = 1;

  • } else if (res == PCRE2_ERROR_NOMATCH) {

  • ret = 0;

  • } else {

  • xasprintf(cause, "%s: regexec failed", re->str);

  • ret = -1; }

  • return (res != PCRE2_ERROR_NOMATCH);

  • pcre2_match_data_free(pmd);

  • return (ret); }

    void

I remain baffled at if (ovector[j + 1] <= ovector[j]). Stop if the next offset is smaller than the current one? Surely the loop conditional is enough - or would be if it was defended against the -1 case.

— Reply to this email directly, view it on GitHub https://github.com/nicm/fdm/issues/122#issuecomment-1407400541, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKI42WZN6B5GUEVJUKTCLWUUOCNANCNFSM6AAAAAAUAYZ4CQ . You are receiving this because you commented.Message ID: @.***>

Freaky commented 1 year ago

I went with ret over early return to avoid having multiple calls to pcre2_match_data_free in different branches. Maybe that's better than having both res and ret in the same scope.

One other thing I noticed from the pcre2_match man page: The length and startoffset values are code units, not characters. So len should be a count of UTF-8 code points and not the current size in bytes? Or does that depend on how pcre2 was compiled?

At least that isn't a memory safety issue if you undercount it...

What. No, a byte size would be >= the number of codepoints in a multibyte string.

PCRE2 considers a "code unit" to be one of 8, 16, or 32 bytes, and doesn't conflate that with code points.

Freaky commented 1 year ago

Patched version fetched my backlog without any of the aforementioned Valgrind warnings and without crashing.

Lots of warnings about leaking 64 bytes, but previous entries like ==12129== definitely lost: 103,764 bytes in 870 blocks are no longer evident.

Freaky commented 1 year ago

Regarding if (ovector[j + 1] <= ovector[j]), I came across this in pcre2demo which does help explain things a bit:

/* Since release 10.38 PCRE2 has locked out the use of \K in lookaround
assertions. However, there is an option to re-enable the old behaviour. If that
is set, it is possible to run patterns such as /(?=.\K)/ that use \K in an
assertion to set the start of a match later than its end. In this demonstration
program, we show how to detect this case, but it shouldn't arise because the
option is never set. */

if (ovector[0] > ovector[1])

If there's an invariant that list[i].so < list[i].eo it does actually need defending from that - though perhaps it would be better to do so by swapping the values?

I note that because the condition is <= it will also break the loop early if one of the groups matched the empty string - that seems like a bug to me.

nicm commented 1 year ago

Please give me your change as a .diff file because it is impossible to copy from GH issues. You will need to name it .txt before it will let you attach it.

Freaky commented 1 year ago

I was going to make a PR, but here you go: patch-pcre.c.txt

nicm commented 1 year ago

I have applied this with trivial style changes. I also changed to < to permit so == eo. Please let me know if you still have problems. Thanks!

Freaky commented 1 year ago

Thank you :)

I've deployed the updated patch and will let you know if anything else crops up. I don't expect it to - I've ran the prior patch for several days without issue.

nicm commented 1 year ago

Great, I will close this but you can still comment if needed.