opencog / link-grammar

The CMU Link Grammar natural language parser
GNU Lesser General Public License v2.1
388 stars 118 forks source link

multi-dict and multi-thread tests crash with a segmentation fault on macOS when built with pcre2 #1514

Open ryandesign opened 4 months ago

ryandesign commented 4 months ago

With the latest code in master, make check fails on macOS if link-grammar is built with pcre2 support. The multi-dict and multi-thread tests crash with a segmentation fault, which doesn't happen if I disable pcre2 with the --with-regexlib=c configure argument. From the crash log, multi-dict crashed here:

Thread 6 Crashed:
0   libpcre2-8.0.dylib                     0x1029e08a7 match + 49451
1   libpcre2-8.0.dylib                     0x1029d4324 pcre2_match_8 + 4536
2   liblink-grammar.5.dylib                0x102886fbf reg_match + 39 (regex-morph.c:239) [inlined]
3   liblink-grammar.5.dylib                0x102886fbf match_regex + 207 (regex-morph.c:428)
4   liblink-grammar.5.dylib                0x1028b4b06 regex_guess + 12 (tokenize.c:377) [inlined]
5   liblink-grammar.5.dylib                0x1028b4b06 separate_word + 886 (tokenize.c:2681)
6   liblink-grammar.5.dylib                0x1028b4489 separate_sentence + 1257 (tokenize.c:3090)
7   liblink-grammar.5.dylib                0x1028ae13a sentence_split + 74 (sentence.c:93)
8   multi-dict                             0x1026fd5ba parse_one_sent(char const*) + 31 (multi-dict.cc:40) [inlined]
9   multi-dict                             0x1026fd5ba parse_sents(int, int) + 122 (multi-dict.cc:82)
10  multi-dict                             0x1026fd7e0 decltype(static_cast<void (*>(fp)(static_cast<int>(fp0), static_cast<int>(fp0))) std::__1::__invoke<void (*)(int, int), int, int>(void (*&&)(int, int), int&&, int&&) + 4 (type_traits:3918) [inlined]
11  multi-dict                             0x1026fd7e0 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int, int), int, int, 2ul, 3ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int, int), int, int>&, std::__1::__tuple_indices<2ul, 3ul>) + 4 (thread:287) [inlined]
12  multi-dict                             0x1026fd7e0 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int, int), int, int> >(void*) + 48 (thread:298)
13  libsystem_pthread.dylib             0x7ff800d7b4e1 _pthread_start + 125
14  libsystem_pthread.dylib             0x7ff800d76f6b thread_start + 15

while multi-thread crashed here:

Thread 3 Crashed:
0   libpcre2-8.0.dylib                     0x10da218e6 match + 362
1   libpcre2-8.0.dylib                     0x10da21324 pcre2_match_8 + 4536
2   liblink-grammar.5.dylib                0x10d8d3fbf reg_match + 39 (regex-morph.c:239) [inlined]
3   liblink-grammar.5.dylib                0x10d8d3fbf match_regex + 207 (regex-morph.c:428)
4   liblink-grammar.5.dylib                0x10d901b06 regex_guess + 12 (tokenize.c:377) [inlined]
5   liblink-grammar.5.dylib                0x10d901b06 separate_word + 886 (tokenize.c:2681)
6   liblink-grammar.5.dylib                0x10d901489 separate_sentence + 1257 (tokenize.c:3090)
7   liblink-grammar.5.dylib                0x10d8fb13a sentence_split + 74 (sentence.c:93)
8   multi-thread                           0x10d74a233 parse_one_sent(Dictionary_s*, Parse_Options_s*, char const*) + 51 (multi-thread.cc:34)
9   multi-thread                           0x10d74a042 parse_sents(Dictionary_s*, Parse_Options_s*, int, int) + 1378 (multi-thread.cc:125)
10  multi-thread                           0x10d74a445 decltype(static_cast<void (*>(fp)(static_cast<Dictionary_s*>(fp0), static_cast<Parse_Options_s*>(fp0), static_cast<int>(fp0), static_cast<int>(fp0))) std::__1::__invoke<void (*)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*, Parse_Options_s*, int, int>(void (*&&)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*&&, Parse_Options_s*&&, int&&, int&&) + 3 (type_traits:3918) [inlined]
11  multi-thread                           0x10d74a445 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*, Parse_Options_s*, int, int, 2ul, 3ul, 4ul, 5ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*, Parse_Options_s*, int, int>&, std::__1::__tuple_indices<2ul, 3ul, 4ul, 5ul>) + 17 (thread:287) [inlined]
12  multi-thread                           0x10d74a445 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*, Parse_Options_s*, int, int> >(void*) + 53 (thread:298)
13  libsystem_pthread.dylib             0x7ff800d7b4e1 _pthread_start + 125
14  libsystem_pthread.dylib             0x7ff800d76f6b thread_start + 15

@ampli said in https://github.com/opencog/link-grammar/pull/1505#issuecomment-2073703909 that this is because:

The current regex-morph.c PCRE2 code doesn't support using multi-threading without threads.h.

Possible solutions:

  • Document that, maybe add a warning in configure, and modify the multi-threading tests to print a warning and exit.
  • Modify the regex-morph.c PCRE2 code to use C++ threads, and modify configure.ac and link-grammar/Makefile.am accordingly.
  • Modify the regex-morph.c PCRE2 code to use Pthreads.

My brief searching suggests that C11 threads (threads.h) are not well supported and pthreads is suggested as the recommended alternative. pthreads are already used elsewhere in the code:

https://github.com/opencog/link-grammar/blob/69c026f6eca72f4a276de69929be6f5a43059ad2/link-grammar/tokenize/spellcheck-hun.c#L22

Maybe using a single threading library for the entire code base would be a good idea. I can't help with that, however, as I haven't written any multithreaded code before.

ampli commented 4 months ago

The problem with PCRE2 in compilers that don't support c11 threads happens only with multi-threaded programs that use the LG library in different threads simultaneously. I think most users never do that, and since PCER2 is faster and better than the alternative libraries, it should remain the default.

If desired, configure can warn that multithreading library use is not supported with PCRE2 on such systems.

linas commented 4 months ago

Hi @ryandesign -- if you will allow me, I'd like to provide some deep history and general, opinionated twitter-thread commentary ...

bool did_we_run_yet = false;
pthread_mutex lock;
pthread_init(lock);
void func_to_call_once() {
    pthread_lock(lock)
    if (did_we_run_yet) return;
    did_we_run_yet = true;
    pthread_unlock(lock)
    rest of subr
}

If you want to learn multi-threaded programming, above is a good starter project.