Open rrthomas opened 3 months ago
Since #391 was closed in favor of this, I just thought I'd add that I've managed to capture the same crash with debug symbols by using the reproduction steps in that issue. (I found that issue because I also have Emacs crashes via the Jinx package.)
I myself use Homebrew and I'm able to reproduce with the library built via Homebrew (using brew install --build-from-source --debug-symbols enchant
to get debug symbols). I've also built enchant from Git and I get the same crash there.
I only have access to an x86 Mac at work (at home I run an M1 Mac and have no crash issues). The following debugger information is using Python 3.13 with debug symbols built by pyenv, enchant from Git (the master branch, compiled with --enable-relocatable --disable-dependency-tracking
because that's what Homebrew uses), and the latest published version of pyenchant (3.3.2).
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x60)
* frame #0: 0x000000010063ca02 libenchant-2.2.dylib`enchant_dict_finalize(obj=0x00006000016c9b60) at dict.vala:57:19 [opt]
frame #1: 0x000000010063c6a5 libenchant-2.2.dylib`enchant_dict_unref(instance=0x00006000016c9b60) at dict.vala:44:3 [opt]
frame #2: 0x0000000100639b9b libenchant-2.2.dylib`enchant_broker_new_dict(self=<unavailable>) at broker.vala:422:2 [opt]
frame #3: 0x000000010060135a enchant_applespell.so`appleSpell_provider_request_dict(me=0x00006000019c8700, tag="zh") at applespell_checker.mm:304:24 [opt]
frame #4: 0x000000010063a03d libenchant-2.2.dylib`_enchant_broker_request_dict(self=0x00006000032cf780, tag="zh", pwl=0x0000000000000000) at broker.vala:274:14 [opt]
frame #5: 0x0000000100639d5e libenchant-2.2.dylib`enchant_broker_request_dict_with_pwl(self=0x00006000032cf780, composite_tag=<unavailable>, pwl=0x0000000000000000) at broker.vala:302:16 [opt]
frame #6: 0x00007ff81c541882 libffi.dylib`ffi_call_unix64 + 82
...
It looks like this.session
is null in the EnchantDict
at the time of destruction which means this.session.owner
will explode. But I may be misinterpreting things since I don't have any experience at all with Vala and very limited experience with C (limited enough to not understand how the generated C files and their typedefs actually work). LLDB at least says so:
(lldb) frame variable
(EnchantDict *) obj = 0x00006000016c9b60
(EnchantDict *) self = 0x00006000016c9b60
(EnchantProvider *) owner = NULL
(EnchantSession *) _tmp0_ = NULL
(EnchantProvider *) _tmp2_ = <variable not available>
(EnchantProvider *) _tmp1_ = <variable not available>
(lldb) p *self
(EnchantDict) {
parent_instance = {
g_class = 0x00006000030cc600
}
ref_count = 0
priv = NULL
user_data = 0x0000000000000000
session = NULL
check_method = 0x0000000000000000
suggest_method = 0x0000000000000000
add_to_session_method = 0x0000000000000000
remove_from_session_method = 0x0000000000000000
get_extra_word_characters_method = 0x0000000000000000
is_word_character_method = 0x0000000000000000
}
Thanks for this, I'll have a look when I get a moment.
I've managed to produce the crash at home as well, and have been poking at this more in LLDB. It's pretty clear that the crash happens because this.session
is null. The address 0x60
in EXC_BAD_ACCESS is because the offset of the provider field in the session class struct (on the C level) is 0x60
bytes, so first this.session
is loaded into a register, and then the attempt is to load the value at offset 0x60
from that value, and we end up at 0x0 + 0x60 = 0x60
.
Fundamentally, the AppleSpell provider is different from e.g. Hunspell, Aspell in that it does no checking by itself whether the requested language exists or not during the dictionary request phase (in appleSpell_provider_request_dict
). It always calls enchant_broker_new_dict
. The others first check whether the requested language even exists, and if not, they return null and never call enchant_broker_new_dict
which means there is no partially initialized data to finalize and the error does not occur.
I must admit I don't quite understand how this code path is triggered via Jinx in Emacs. All the above is based on the reproduction steps in #391 which uses pyenchant. On my x86 Mac, a single call to request_dict
in the Broker
class in pyenchant is enough to trigger the crash, but on my M1 Mac I have to run it twice, ignoring the first result (which is an exception complaining that the dictionary doesn't exist). I don't understand why this differs and haven't yet tried to figure out why (maybe it has to do when the Python interpreter disposes of the pyenchant data and the finalizer is triggered).
Anyhow, I figured I'd add some more context. Ultimately I don't know whether it's actually helpful or not. I will try to see if I can monkey around with the provider and do a similar existence check in appleSpell_provider_request_dict
that's used in the other providers.
Thanks very much for your efforts; the information above does look useful if you don't manage to get any further yourself.
I'm now confident I understand why this is happening. In appleSpell_provider_request_dict
an EnchantDict
is always created first (and thereby added to the broker's set of dictionaries, as it's created via EnchantBroker.new_dict
).
But if the actual dictionary request then fails, the EnchantDict
is freed immediately (despite the broker still having a reference to it!) and the function returns null. The null return value causes EnchantBroker._request_dict
to not set dict.session
, and once the EnchantDict
destructor runs it will explode, since this.session
is null.
If the EnchantDict
destructor would not run almost immediately, there would be a chance of junk data being written to the freed memory I suppose. (In my testing, if I simply remove the call to g_free
, the crash occurs once the broker is destructed, with the same cause as this.session
is still null in all the EnchantDict
s that are created despite no dictionary being available for the given code. So all it does is delay when the dict's destructor runs, at least in my testing scenario.)
The solution is to move the creation of the EnchantDict
to occur only after all its prerequisites are guaranteed to exist in appleSpell_provider_request_dict
(i.e. if allocating the AppleSpellDictionary
and finding a dictionary are both successful). That will also remove the calls to g_free (dict)
which happen right now in an attempt to clean up if one of these prerequisites fails to materialize.
I will send a PR with this fix if this sounds reasonable to you. My question is if you want the PR to only state that it fixes #391? This issue (meaning #392) is more broad than just this one bug and fixing the bug doesn't necessarily answer the questions posed here.
The AppleSpell provider seems to be buggy these days (see e.g. #391), and I don't use macOS. I'd be happy to look into this on a paid consultancy basis; otherwise, I'd be delighted to have help from a macOS-using developer.
In the mean time, I am minded to deprecate the provider, and put it in the same category as the Zemberek back-end: not built by default, you get a warning if you enable it, and if it crashes that's a shame, but not my problem.