rrthomas / enchant

enchant spellchecking library
http://rrthomas.github.io/enchant/
GNU Lesser General Public License v2.1
347 stars 60 forks source link

Enchant 3: fully GObject-based with GIR support #374

Open rrthomas opened 7 months ago

rrthomas commented 7 months ago

Break backwards compatibility, and depend on GLib for the external API. The advantage of this is that any language with GIR bindings will be able to use Enchant directly.

Existing bindings such as those for C++ and Python (as well as C) will then be redundant, and can be auto-generated, as can the Vala binding.

Since the new API will introduce no new functionality, see if it's possible to implement the existing C API on top of the new one in a backwards-compatible way. (That would in turn enable the existing C++ API to continue to work.)

minad commented 6 months ago

I would greatly appreciate if the Enchant 2 C api could be kept as is, such that existing software can seamlessly upgrade to Enchant 3. It will probably also help package distributors and subsequently increase adoption. The question is how to proceed of you introduce new functionality in the new gobject API. Maybe in that case new functions could be added to the current c API? Alternatively the current c API could be kept as frozen and phased out?

rrthomas commented 6 months ago

I doubt it will be possible to seamlessly upgrade, although I will do my best! More likely, I will continue to support Enchant 2 for some time, and Enchant's build system already builds versioned libraries, executables etc., so it's easy to install different major versions in parallel.

minad commented 6 months ago

Great, thanks! Maybe if the APIs were mostly compatible and if I could detect the version with some ENCHANT_VERSION macro, I could support both versions 2 and 3 at the same time in my Jinx spell checker for Emacs? Could you add such a version macro to enchant.h? I am likely stuck with supporting Enchant 2 for the next five years given that Enchant 2 is widely distributed in LTS Linux distributions version. For my personal use case, I wouldn't mind to upgrade to Enchant 3 asap (even if the API is different), but this is really about supporting as many setups as possible.

rrthomas commented 6 months ago

Great, thanks! Maybe if the APIs were mostly compatible and if I could detect the version with some ENCHANT_VERSION macro, I could support both versions 2 and 3 at the same time in my Jinx spell checker for Emacs?

I think the current support for parallel installation is sufficient: you can detect Enchant 1 & 2 and choose which to use; the same will automatically be the case for further major versions. In particular, the header is installed in a versioned directory from version 2 onwards, the shared library is versioned and the pkg-config files are versioned.

minad commented 6 months ago

I think the current support for parallel installation is sufficient

For my purpose it would be a little better if I had the ability to detect the version directly via a macro. Then I could use #ifdefs to handle different versions. Right now I have to implement my own detection (via pkgconf before compilation) and then pass a -D... option to the C compiler.

rrthomas commented 6 months ago

For my purpose it would be a little better if I had the ability to detect the version directly via a macro. Then I could use #ifdefs to handle different versions. Right now I have to implement my own detection (via pkgconf before compilation) and then pass a -D... option to the C compiler.

With Enchant's parallel installs, you're not detecting a version, you're requesting a version. If you ask pkg-config for enchant you get v1; if you ask for enchant-2, you get v2; if you ask for enchant-3 you'll get v3. Probably you'll want to ask for one by preference (e.g. v2 in the early days of v3, then v3 once you're happy with the implementation using it), falling back to the other if your first choice is not available. At this point you know which version you're using, and have no need to detect it again.

I feel like maybe I just said the same thing as you but you think there's a problem? I'm baffled.

minad commented 6 months ago

I plan to use a different approach. I will not request a version, I will take whatever I get (2 or 3) to support as many installs as possible. Then it would be slightly easier if I could detect the version directly in the C file via ifdefs. It would also be more robust, if Enchant provided a version macro since this way I would exclude other configuration errors, which may happen. I see your point that if things are done properly I can detect the version myself, and after I know the version, I proceed from there. But I also don't see a problem with specifying an ENCHANT_VERSION macro directly in the enchant.h and this is what I'd like to ask for.

rrthomas commented 6 months ago

I plan to use a different approach. I will not request a version, I will take whatever I get (2 or 3)

This is the bit I don't understand. You can't take whatever you get, as the two versions are installed with different names. You have to ask for one of those names. You could combine the flags supplied pkg-config for both versions 2 & 3 and then see which enchant.h gets included when you #include <enchant.h>, but that seems obtuse!

minad commented 6 months ago

The problem is this - compilation does not always go through pkgconf/pkg-config. Sometimes people compile manually and then things get more complicated.

Right now I also check the Enchant version dynamically in my Emacs module, see https://github.com/minad/jinx/blob/3c36f1eb31713869ffbdbf55971671efa4f01966/jinx-mod.c#L183-L188. This check has to be kept anyway, however I would also want to check the version statically at compile time.

rrthomas commented 6 months ago

The problem is this - compilation does not always go through pkgconf/pkg-config. Sometimes people compile manually and then things get more complicated.

This is something I regard as Not My Problem. I provide a build system (which uses pkg-config), users can either use it and complain when it breaks, or however they like and keep all the pieces when it breaks. I have gone to great lengths to make a build system that is portable and covers just about any conceivable way one might want to build.

Right now I also check the Enchant version dynamically in my Emacs module, see https://github.com/minad/jinx/blob/3c36f1eb31713869ffbdbf55971671efa4f01966/jinx-mod.c#L183-L188. This check has to be kept anyway, however I would also want to check the version statically at compile time.

Interesting!

minad commented 6 months ago

Reuben Thomas @.***> writes:

The problem is this - compilation does not always go through pkgconf/pkg-config. Sometimes people compile manually and then things get more complicated.

This is something I regard as Not My Problem. I provide a build system (which uses pkg-config), users can either use it and complain when it breaks, or however they like and keep all the pieces when it breaks. I have gone to great lengths to make a build system that is portable and covers just about any conceivable way one might want to build.

Actually, I also don't consider it my problem if people do that, since my build code uses pkg-config. However it becomes my problem as soon as people deviate from that approach and complain that something doesn't work properly. And in this case any kind of additional checking helps with detecting problems early on, like a static version check.

The problem is also that Emacs doesn't offer a canonical mechanism to build dynamic/native Emacs modules. I use the following code (which uses pkg-config):

https://github.com/minad/jinx/blob/3c36f1eb31713869ffbdbf55971671efa4f01966/jinx.el#L575-L614

Now there are many problematic scenarios:

rrthomas commented 6 months ago

I sympathise. I have used tree-sitter-mode, and it's amazing to me that it manages seamlessly to build shared libraries from Rust and install them, but of course that's a much more integrated build environment. Doing anything with C (until we get a widely-adopted equivalent of cargo/npm/etc.) is a dog's breakfast. I still think I've done my best: the build system will complain if pkg-config is not installed. Indeed, any GNU build system should be possible to use as a black box, from which you either get success or an error code, and maybe a comprehensible error message.

minad commented 6 months ago

Yes, I don't doubt that you did your best. But why not simply add a #define ENCHANT_VERSION macro? This is a pattern found in many C libraries.

rrthomas commented 6 months ago

Because, as I said, you already know what version of Enchant you are #including when you #include it. It's the same as the reason you don't have a macro to tell you whether you're #including libfoo or libbar: you already know!

minad commented 6 months ago

No, this is not the case as I told you. I don't necessarily control the build process. But anyway, you don't want to, so be it.

rrthomas commented 6 months ago

Sorry, I must be being very dense here. I don't see what it has to do with controlling the build process. When you #include <libfoo.h> you are getting libfoo. When you #include <libbar.h> you are getting libbar. When you #include <enchant-2/enchant.h> you are getting Enchant v2; when you #include <enchant-3/enchant.h> you are getting Enchant v3. If you like, imagine I have put the version in the name of the include file (rather than its containing directory); maybe I'll do just that for v3!

minad commented 6 months ago

No. I understood your point, perfectly in fact. But you are not seeing what I am trying to tell you.

You are looking at it from the "sane perspective" of a distribution maintainer, someone who fully controls the build process. In my case, the situation is more unfortunate, since users try to build the Emacs module in various ways, if my automatized pkg-config-based method fails.

They will run commands like this, if for example pkg-config is lacking on their installation or if they build Enchant themselves and try to link to it afterwards.

gcc -I/usr/include/enchant-2 -c jinx-mod.c ...
gcc -I/usr/include/enchant-3 -c jinx-mod.c ...

Now I could of course enforce a version of Enchant by writing #include <enchant-2/enchant.h> as you suggest. But this misses then the other goal that I had in mind - supporting both Enchant 2 and Enchant 3 in the same module file, assuming that the APIs are sufficiently close such that I could work around differences with #if.

Also note that pkg-config --cflags on Debian generates this - it does not include the enchant-2 path.

$ pkg-config --cflags enchant-2
-I/usr/include/enchant-2 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/sysprof-6 -pthread

But as I am understanding you, your point seems to be that you deliberately want to introduce a hard break in the API, like renaming enchant.h to enchant3.h.

rrthomas commented 6 months ago

They will run commands like this, if for example pkg-config is lacking on their installation or if they build Enchant themselves and try to link to it afterwards.

gcc -I/usr/include/enchant-2 -c jinx-mod.c ...
gcc -I/usr/include/enchant-3 -c jinx-mod.c ...

This is not something it's reasonable for users to expect support for. If a user doesn't have pkg-config, or some other build dependency installed, they can simply install it. Realistically, there are two cases here: either an Emacs user doesn't know how to build software from source, in which case they're not going to be issuing any commands like this; or, they do know how to build software from source, in which case they can install the build dependencies, and that is the simplest way to install the software. Trying to hack around and build stuff as above is simply not a good idea, it's creating problems for oneself, expecting work from you, the jinx maintainer, and it's harder than doing it the right way!

Now I could of course enforce a version of Enchant by writing #include <enchant-2/enchant.h> as you suggest.

I don't think this is necessary. I'm only saying that that is, "morally", as mathematicians say, what one is doing.

Also note that pkg-config --cflags on Debian generates this - it does not include the enchant-2 path.

$ pkg-config --cflags enchant-2
-I/usr/include/enchant-2 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/sysprof-6 -pthread

The path is right there in the -I flag, but I see your point: it's the same as an -I flag that gives an installation location, and at the point where you #include <enchant.h> it doesn't look like a different include file.

Oh well.

I'll see how different Enchant v3 is, and then think again. I deliberately don't want to constrain it to be backwards-compatible, because I want it to be a clean, simple GObject-based API. But if it does end up sufficiently similar, then indeed it will be tempting to make it possible to write code that conditionally compiles for both v2 & v3.

minad commented 6 months ago

This is not something it's reasonable for users to expect support for.

Of course not. But I have seen all these things. But this is understandable: My Emacs module is used by people who may not be experienced C developers - it is just a M-x package-install away. If the documented automatized installation method fails, people will try various things. The problem is really that C libraries are not supposed to be installed like this - other Emacs native modules have the similar problems with the build process on all the heterogeneous target platforms (you've surely seen or used the vterm Emacs package for example). Other more modern programming languages and build systems have solved this in a better way, e.g., Rust, as you've mentioned.

Also note that I don't want to support such problematic use cases. But it would be nice to detect them and give clear error messages during compilation - something which I could do if there were an ENCHANT_VERSION macro.

I'll see how different Enchant v3 is, and then think again. I deliberately don't want to constrain it to be backwards-compatible, because I want it to be a clean, simple GObject-based API.

Makes sense. I agree that you should not constrain yourself. It makes sense to create such breaking points, if there is a clear benefit.

But if it does end up sufficiently similar, then indeed it will be tempting to make it possible to write code that conditionally compiles for both v2 & v3.

Yes, this is what I am hoping for. But our discussion is a bit hypothetical right now. We'll see.

rrthomas commented 6 months ago

Of course not. But I have seen all these things. But this is understandable: My Emacs module is used by people who may not be experienced C developers - it is just a M-x package-install away. If the documented automatized installation method fails, people will try various things.

I understand that people may try various things, though do non-developers really try running GCC like that? I just find it baffling—in my experience, technically inexperienced people avoid the command line, and simply give up when something doesn't work automatically. This is why I feel no compunction about offering no support for these experiments: my assumption is that the users who do such things know enough to know better.

Also note that I don't want to support such problematic use cases. But it would be nice to detect them and give clear error messages during compilation - something which I could do if there were an ENCHANT_VERSION macro.

Makes sense, it's an easy way to help.

minad commented 6 months ago

I understand that people may try various things, though do non-developers really try running GCC like that? I just find it baffling—in my experience, technically inexperienced people avoid the command line, and simply give up when something doesn't work automatically.

Well, you are surely correct about non-technical people, who will soon give up, but there is a huge spectrum between beginner devs and kernel hackers. Various things are tried based on some half-knowledge. Also Emacs users are usually not people who give up right away, that's at least my impression given the learning curve.

This is why I feel no compunction about offering no support for these experiments: my assumption is that the users who do such things know enough to know better.

Sure. I still think that there is not really a good argument against an ENCHANT_VERSION macro. It is maybe not elegant, so that could reason enough? Does it make distribution harder? Does it make maintenance harder? I like static checking and additional safe guards and I believe such a check could help me. And furthermore it would allow conditional compilation if Enchant 2 and Enchant 3 were sufficiently close, which may or may not be the case.