rrthomas / enchant

enchant spellchecking library
http://rrthomas.github.io/enchant/
GNU Lesser General Public License v2.1
347 stars 60 forks source link

brew-installed on mac intermittently segfaults #391

Closed agzam closed 3 months ago

agzam commented 3 months ago

enchant that continues to work delightfully on my Linux machine, recently was a source of some frustration on a Mac.

see the details in this r/emacs post

I'm not sure what exactly causes that, but it seems there's a difference between brew-installed (with default options) and manually-built from the tarball, even though the versions are the same. I checked the formula and couldn't spot anything wrong.

rrthomas commented 3 months ago

Thanks for the report, sorry to hear about this frustrating problem.

The most likely candidate for segfaults, given the symptoms, is the AppleSpell backend. Unfortunately I can't test this much, as I don't have macOS. I will of course be delighted to receive improved tests, which I can run in CI, to help me avoid introducing bugs, fixes for bugs I may have introduced, and paid commissions to improve macOS support (or indeed Windows, where we currently don't support the native back-end). The obvious alternative, in the absence of help and the continuation of bug reports, would be to give up on AppleSpell support and simply remove it.

The stuff about finding dictionaries is annoying. I have attempted to improve both functionality and documentation in this area in recent releases. This is a partly a fundamental weakness of Enchant's design: as it delegates to other spelling providers, some of the details of how they work leaks out. Partly, the problem until recently has been that Enchant was insufficiently uniform in its treatment of back-ends. This I have recently improved considerably, and documented the remaining differences.

The problems with enchant.ordering are almost always a red herring: on inspection, it is almost never the problem. But it does have to be checked.

I can't help directly with packaging, that's simply beyond the scope of Enchant per se. I need bug reports about building Enchant from source. Again, I sympathise with users trying to work out whether a problem is ultimately caused by the packaging or by upstream, and I know that different packaging systems put different amounts of effort in and have different levels of resources for fixing problems with packaging.

Jinx seems to have brought the largest number of new users to Enchant for many years, which is great. However, it has also resulted in a lot of confusion and frustration, largely because it exposes users more directly to Enchant. (Most other uses of Enchant are insulated behind an app or framework that have their own configuration.) Some of this I have addressed by improved documentation and better debugging facilities for users; but in other areas (bug fixes, functionality improvement, and the rewrite in Vala) there is unavoidable churn and new bugs, which will of course mean that new problems emerge; and with Enchant they are often frustratingly similar to old problems.

Also recently I have introduced one piece of major new functionality, composite dictionaries.

The good news as of now is that I don't plan further major changes to the library any time soon (the currently open issues that I'm most likely to work on are for the command-line commands). So stability and documentation should improve in the short term, as long as I get actionable bug reports.

rrthomas commented 3 months ago

To progress this particular issue, I'm afraid I'll need something I can work with without needing to use brew or macOS. I'm happy to leave it for other users, but "a particular downstream packaging of Enchant crashes in some configurations on a proprietary OS" is not something I'm prepared to spend unpaid time on.

agzam commented 3 months ago

I heartily appreciate your lengthy, detailed explanations, and I apologize for making you spend time on this. Of course, it's never my desire to waste your time, especially as you explained the issue is with AppleSpell. Unfortunately, I myself have a very shallow understanding of how Enchant works; just like I said, never before did it give me any trouble on my Linux machine; otherwise, I would love to try to be more helpful here.

I solved the problem for myself by compiling it from the tarball; it now works as delightfully as always. I just thought it would be a good idea to raise awareness so others wouldn't have to waste their time plucking their eyebrows like I had to do.

Thank you, Ruben, for such an amazing craft. I'm very happy that I was able to isolate the problem and fix it, and to a certain degree, I'm glad this has happened. I never before thought about how important this piece of software is for my comfortable well-being.

If there's nothing else to do here, please feel free to close this. Thanks!

rrthomas commented 3 months ago

Thanks, @agzam, I'm happy to leave this issue open for now in case anyone wants to work on the brew packaging and perhaps isolate the underlying problem here, whether it's in the packaged build or in the upstream code.

jenstroeger commented 3 months ago

@rrthomas I just linked an issue which may be related, but I’ve not investigated more thoroughly.

rrthomas commented 3 months ago

Thanks, @jenstroeger, I think this is looking more and more like a bug in the AppleSpell provider, which I am minded to deprecate and ask for help with.

jenstroeger commented 3 months ago

@rrthomas regarding your comment https://github.com/pyenchant/pyenchant/issues/321#issuecomment-2272986598

I don't use macOS; hence, I could really do with a backtrace of the segfault, or, failing that, evidence that this crash only happens when AppleSpell is configured.

Happy to help with whatever you need, just let me know! I’ve got the native enchant lib installed:

> port contents enchant2
Port enchant2 contains:
  /opt/local/bin/enchant-2
  /opt/local/bin/enchant-lsmod-2
  /opt/local/include/enchant-2/enchant++.h
  /opt/local/include/enchant-2/enchant.h
  /opt/local/lib/enchant-2/enchant_applespell.a
  /opt/local/lib/enchant-2/enchant_applespell.so
  /opt/local/lib/enchant-2/enchant_hunspell.a
  /opt/local/lib/enchant-2/enchant_hunspell.so
  /opt/local/lib/libenchant-2.2.dylib
  /opt/local/lib/libenchant-2.a
  /opt/local/lib/libenchant-2.dylib
  /opt/local/lib/pkgconfig/enchant-2.pc
  /opt/local/share/doc/enchant/enchant-2.html
  /opt/local/share/doc/enchant/enchant-lsmod-2.html
  /opt/local/share/doc/enchant/enchant.html
  /opt/local/share/enchant-2-2/AppleSpell.config
  /opt/local/share/enchant-2-2/enchant.ordering
  /opt/local/share/man/man1/enchant-2.1.gz
  /opt/local/share/man/man1/enchant-lsmod-2.1.gz
  /opt/local/share/man/man5/enchant.5.gz

I renamed the enchant_applespell.a|so files so they couldn’t be loaded:

>>> import enchant
>>> b = enchant.Broker()
>>> b.set_ordering("*", "nuspell,hunspell,myspell,aspell")
>>> b.describe()
[<Enchant: Hunspell Provider>]

and then the problem in issue https://github.com/pyenchant/pyenchant/issues/321 didn’t occur:

>>> b.dict_exists('ar')
False
>>> b.request_dict('ar')  # I can repeat this any number of times without segfault now.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/.venv/lib/python3.10/site-packages/enchant/__init__.py", line 272, in request_dict
    return Dict(tag, self)
  File "/path/to/.venv/lib/python3.10/site-packages/enchant/__init__.py", line 542, in __init__
    super().__init__()
  File "/path/to/.venv/lib/python3.10/site-packages/enchant/__init__.py", line 144, in __init__
    self._init_this()
  File "/path/to/.venv/lib/python3.10/site-packages/enchant/__init__.py", line 549, in _init_this
    this = self._broker._request_dict_data(self.tag)
  File "/path/to/.venv/lib/python3.10/site-packages/enchant/__init__.py", line 287, in _request_dict_data
    self._raise_error(e_str % (tag,), DictNotFoundError)
  File "/path/to/.venv/lib/python3.10/site-packages/enchant/__init__.py", line 233, in _raise_error
    raise eclass(default)
enchant.errors.DictNotFoundError: Dictionary for language 'ar' could not be found
Please check https://pyenchant.github.io/pyenchant/ for details

Let me know what you’d like me to run to capture a segfault stack trace!

rrthomas commented 3 months ago

@jenstroger, many thanks, that's already pretty strong evidence. I don't know how to get backtraces from segfaults on macOS; on Linux I'd use gdb on a core dump. I just tried searching "segfault backtrace macos" and found https://stackoverflow.com/questions/8370468/debugging-segmentation-faults-on-a-mac but if this is more trouble than you're willing to go to, then don't worry! I think I'll go ahead with deprecating the AppleSpell backend pending finding a maintainer for it (or a sponsor for me!), in any case.

rrthomas commented 3 months ago

@jenstroeger On second thoughts, I had a quick look at the AppleSpell back-end, and I think I failed to update it properly for the recent rework of the internal provider API.

I have, I believe, now fixed it. If you can possibly test it from git, I would be most grateful. Otherwise, I will make a release and see what happens when it makes it into brew. If I haven't improved things, it will be time to deprecate it and await help!

jenstroeger commented 3 months ago

I don't know how to get backtraces from segfaults on macOS; on Linux I'd use gdb on a core dump. I just tried searching "segfault backtrace macos" and found https://stackoverflow.com/questions/8370468/debugging-segmentation-faults-on-a-mac […]

I’m happy to run through this, I spent years noodling on that level of code 🤓

If you can possibly test it from git, I would be most grateful.

Sure, any particular branch you want me to pull from?

rrthomas commented 3 months ago

@jenstroger, many thanks! Please use main.

jenstroeger commented 3 months ago

@jenstroger, many thanks! Please use main.

@rrthomas I don’t see a main branch for this repo?

rrthomas commented 3 months ago

@jenstroeger Apologies, I meant master. EDIT: and specifically rrthomas/master: https://github.com/rrthomas/enchant/

jenstroeger commented 3 months ago

Ah thank you @rrthomas, now I see commit https://github.com/rrthomas/enchant/commit/06c9166ea442cac07507736933a479ca22e18447 🤓 FWIW, we’re not quite there yet and the Python test script still crashes.

However, and for future reference, here are the steps I took to obtain the stack trace on Mac:

And here’s the stack trace you asked for:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000060
Exception Codes:       0x0000000000000001, 0x0000000000000060

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [42633]

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libenchant-2.2.dylib                   0x1106a4afb enchant_dict_finalize + 18
1   libenchant-2.2.dylib                   0x1106a4837 enchant_dict_unref + 21
2   libglib-2.0.0.dylib                    0x110b10d6d g_hash_table_remove_internal + 266
3   libenchant-2.2.dylib                   0x1106a2f90 enchant_broker_free_dict + 28
4   enchant_applespell.so                  0x110710243 appleSpell_provider_request_dict(_EnchantProvider*, char const*) + 339
5   libenchant-2.2.dylib                   0x1106a2862 _enchant_broker_request_dict + 78
6   libenchant-2.2.dylib                   0x1106a2668 enchant_broker_request_dict_with_pwl + 247
7   libffi.8.dylib                         0x10fc53dc2 ffi_call_unix64 + 82
8   libffi.8.dylib                         0x10fc5368a ffi_call_int + 739
9   libffi.8.dylib                         0x10fc53383 ffi_call + 231
10  _ctypes.cpython-310-darwin.so          0x10fc7117e _ctypes_callproc + 731
11  _ctypes.cpython-310-darwin.so          0x10fc6c48f PyCFuncPtr_call + 246
12  Python                                 0x10ffbe779 _PyObject_MakeTpCall + 129
13  Python                                 0x11008e18e call_function + 295
14  Python                                 0x1100890a6 _PyEval_EvalFrameDefault + 28507
15  Python                                 0x1100811b5 _PyEval_Vector + 365
16  Python                                 0x11008e117 call_function + 176
17  Python                                 0x110087d3f _PyEval_EvalFrameDefault + 23540
18  Python                                 0x1100811b5 _PyEval_Vector + 365
19  Python                                 0x11008e117 call_function + 176
20  Python                                 0x110087d3f _PyEval_EvalFrameDefault + 23540
21  Python                                 0x1100811b5 _PyEval_Vector + 365
22  Python                                 0x10ffc1871 method_vectorcall + 156
23  Python                                 0x11008e117 call_function + 176
24  Python                                 0x1100890a6 _PyEval_EvalFrameDefault + 28507
25  Python                                 0x1100811b5 _PyEval_Vector + 365
26  Python                                 0x10ffbeaaa _PyObject_FastCallDictTstate + 87
27  Python                                 0x11001a976 slot_tp_init + 208
28  Python                                 0x110012e16 type_call + 267
29  Python                                 0x10ffbe779 _PyObject_MakeTpCall + 129
30  Python                                 0x11008e18e call_function + 295
31  Python                                 0x1100835cc _PyEval_EvalFrameDefault + 5249
32  Python                                 0x1100811b5 _PyEval_Vector + 365
33  Python                                 0x11008e117 call_function + 176
34  Python                                 0x110087d3f _PyEval_EvalFrameDefault + 23540
35  Python                                 0x1100811b5 _PyEval_Vector + 365
36  Python                                 0x110081032 PyEval_EvalCode + 114
37  Python                                 0x1100d0a6e run_eval_code_obj + 72
38  Python                                 0x1100d09fe run_mod + 96
39  Python                                 0x1100d0874 pyrun_file + 133
40  Python                                 0x1100d0381 _PyRun_SimpleFileObject + 275
41  Python                                 0x1100cfd3a _PyRun_AnyFileObject + 143
42  Python                                 0x1100e9b13 pymain_run_file_obj + 226
43  Python                                 0x1100e94d7 pymain_run_file + 85
44  Python                                 0x1100e8f54 Py_RunMain + 984
45  Python                                 0x1100e9e94 Py_BytesMain + 42
46  dyld                                0x7ff80d092418 start + 1896

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000600001d584e0  rcx: 0x0000600002c511a0  rdx: 0x0000000000000000
  rdi: 0x0000000000000000  rsi: 0x0000600001d584e0  rbp: 0x00007ff7b036d030  rsp: 0x00007ff7b036d020
   r8: 0x00000000006f33ae   r9: 0xffffffff00000000  r10: 0x0000000000000000  r11: 0x0000600003b597ec
  r12: 0x0000000000000005  r13: 0x0000000000000005  r14: 0x0000000000000001  r15: 0x0000000001d584e0
  rip: 0x00000001106a4afb  rfl: 0x0000000000010246  cr2: 0x0000000000000060

Logical CPU:     6
Error Code:      0x00000004 (no mapping for user data read)
Trap Number:     14

Thread 0 instruction stream:
  04 46 00 00 48 89 c6 31-c0 5d e9 99 3b 00 00 48  .F..H..1.]..;..H
  c7 00 00 00 00 00 31 c0-5d c3 55 48 89 e5 53 50  ......1.].UH..SP
  48 89 fb e8 ce 3b 00 00-48 89 05 5e 95 00 00 48  H....;..H..^...H
  8d 05 18 00 00 00 48 89-43 08 48 83 c4 08 5b 5d  ......H.C.H...[]
  c3 55 48 89 e5 c7 47 08-01 00 00 00 5d c3 55 48  .UH...G.....].UH
  89 e5 53 50 48 89 fb e8-f2 3a 00 00 48 8b 7b 20  ..SPH....:..H.{ 
 [48]8b 47 60 48 85 c0 74-12 48 89 c7 48 89 de ff  H.G`H..t.H..H... <==
  50 40 48 8b 7b 20 48 85-ff 74 0d e8 5a 29 00 00  P@H.{ H..t..Z)..
  48 c7 43 20 00 00 00 00-48 83 c4 08 5b 5d c3 55  H.C ....H...[].U
  48 89 e5 41 57 41 56 53-50 48 c7 45 e0 00 00 00  H..AWAVSPH.E....
  00 48 8d 3d 4f 48 00 00-e8 f3 39 00 00 48 85 c0  .H.=OH....9..H..
  74 3a 4c 8d 75 e0 48 89-c7 48 c7 c6 ff ff ff ff  t:L.u.H..H......

I suppose if we’d recompile with debug symbols then it would make reading the stack trace a little easier. Meanwhile, the offending function seems to be this one (using otool -tvV /opt/local/lib/libenchant-2.2.dylib):

_enchant_dict_finalize:
0000000000006ae9        pushq   %rbp
0000000000006aea        movq    %rsp, %rbp
0000000000006aed        pushq   %rbx
0000000000006aee        pushq   %rax
0000000000006aef        movq    %rdi, %rbx
0000000000006af2        callq   0xa5e9                ## symbol stub for: _g_signal_handlers_destroy
0000000000006af7        movq    0x20(%rbx), %rdi
0000000000006afb        movq    0x60(%rdi), %rax      <== %rdi is 0x0!
0000000000006aff        testq   %rax, %rax
0000000000006b02        je      0x6b16
0000000000006b04        movq    %rax, %rdi
0000000000006b07        movq    %rbx, %rsi
0000000000006b0a        callq   *0x40(%rax)
0000000000006b0d        movq    0x20(%rbx), %rdi
0000000000006b11        testq   %rdi, %rdi
0000000000006b14        je      0x6b23
0000000000006b16        callq   _enchant_session_unref
0000000000006b1b        movq    $0x0, 0x20(%rbx)
0000000000006b23        addq    $0x8, %rsp
0000000000006b27        popq    %rbx
0000000000006b28        popq    %rbp
0000000000006b29        retq

Does this help somewhat?

[^1]: I use MacPorts and Brew users may need to copy those files elsewhere.

rrthomas commented 3 months ago

Thanks very much for this. It would be great if you could repeat with debug symbols to give source line numbers and parameter values. The stack trace you give is not a big surprise, but I can't see what's going wrong exactly.

jenstroeger commented 3 months ago

I didn’t see it documented and

> CFLAGS='-g -O0' make

doesn’t seem to work — currently this still compiles with -g -O2. Should I edit the Makefile (unusual) or how do I switch compiler flags in this case?

rrthomas commented 3 months ago

There shouldn't be any need to change CFLAGS; you just need -g, you don't need to compile without optimization; indeed, that can hide bugs. So the default flags should be fine.

jenstroeger commented 3 months ago

That makes sense and -g is one of the compiler options when I build the library. Looks like the symbols weren’t used well, though?

Having said that, the above steps only replace the AppleSpell lib and still run the original MacPorts libenchant-2.2.dylib. I basically just dropped the newly compiled file into the existing, previously compiled installation. I should probably build a develop Portfile and build & install the entire enchant port for testing 🤔

rrthomas commented 3 months ago

I'm sorry, I had another look and I can't spot anything obviously wrong with the way you're trying to get debug info. On Linux, building with -g includes debug info in shared libs, and unless you later run strip on them (during make install for example), then that info should be preserved. I thought macOS worked the same way, but I am not up to date with recent changes there.

You might do better to install from source (in a location of your choice) and use that, as you say. It will be less likely to give confusing results.

Thanks for your continued efforts!

jenstroeger commented 3 months ago

@rrthomas I’ve not forgotten, currently traveling. I hope to pick this up this coming weekend.

Also, shall we continue this discussion over on issue https://github.com/AbiWord/enchant/issues/392 which seems more related to the investigation than this one?

rrthomas commented 3 months ago

Thanks, @jenstroeger, closing this issue in favour of #392.