olivierkes / manuskript

A open-source tool for writers
http://www.theologeek.ch/manuskript
GNU General Public License v3.0
1.75k stars 232 forks source link

Gtk Port: Spell Check/ Grammar Check #1229

Open TheShadowOfHassen opened 1 year ago

TheShadowOfHassen commented 1 year ago

I want to work on Spelling/ Grammar checking next. Clearly, we'd want it for both the Gtk.TextView, but also the Gtk.Entry. Also, we'd want to include checkers such a hunspell, enchant and LanguageTool and maybe other things.

Firstly, we want to make it exceptionable so many kinds of languages can be used, we also want a global dictionary for spell checking.

Also I think we'd want some kind of rule editor for LanguageTool (because at least in English, of the English LanguageTool rules don't apply for fiction) so maybe profiles as well.

Is there anything else I'm missing?

TheJackiMonster commented 1 year ago

We want to implement it as modular as before. That's important because different spell checker will require different amounts of resources to run properly. So ideally people with lower specs can run a more light-weight checker against a simple dictionary. But people with higher specs can utilize the spell and grammar checking from LanguageTool and others (which provides tips to fix issues as well).

I would say color coding for grammar and spell checking is important too. We probably want to provide settings to adjust colors finer than "everything misspelled is that one color".

For the rules I would definitely like the idea to mix languages when using dictionaries.

TheShadowOfHassen commented 1 year ago

Thanks, @TheJackiMonster, one other question is there instead of manually adding spellchecking to every single line a way for the program to take the currently active text widget and just apply it to that? Or a way to bulk add extra things to every textviews and entries (Like an overright in GTK that would use a custom version?)

obw commented 1 year ago

@TheShadowOfHassen the Problem is, that the different Spellchecker work differently, Tools like LanguageTool need the hole Text at once, because this Tool makes also a grammatical check and style check (word doubling, warning for use of vulgar speech (Too funny sometimes, when I write some characters which have a very special attitude!) and so on!)

Aspeall or Hunsell, are simply try to find wrong written Words, with catalogs and not like LanguageTool with a set of Rules and catalogs!

Also, there is the theme names thesaurus, which we should be sometime in the future implement, which also is candidate for this interface, to be used to implement!

Something I want to announce, in this context:

I'm also working on something, not in python I don't know why this language and I are in war, but every time I try Python, I get frustrated, because it just doesn't the way I want! What I'm working is a problem what I have sometimes, that I use Words and phrases like a fashion.

We have the frequency analyzation, to find something like this, my Target hers is something more interactive to solve this technical writing problem. First find such duplicates and to often used words, then a simple GUI, most likely terminal based:

  1. First find all mannerism and to often used words and word groups!
  2. Word and word groups will be showed in order of the Number found.
    • Ignore the find, this time
    • Ignore always
    • Show all Places where the actual thing is found and use a thesaurus, to replace it, where possible, or rewrite the sentence or paragraph!
  3. One thing what is important, for this tool
    • I can always stop mid-work (All things must be saved after each step, is done!)
    • Everything should be reversible. As long the User doesn't manipulate the Text with another Editor / Tool.
    • Works with all kind of text based files (markdown, xml, latex, and so on)
    • LaTex
      • went to the complete Project
      • parse includes, also to mark files not to include to analyze
      • starts with the main Project
    • LyX
      • see LaTex
    • Markdown
      • Manuskript, iterate over the outline directory
      • Use the label-marker to filter which files shout by analyzed
      • Wiki Systems
      • DokuWiki RPC access
      • XML DocBook
    • Works fast even for big Projects (Multitrheaded)
    • More Formats will be easy to implement, because it's a plugin
    • More Access Methods, are also per Plugin possible! (Mediawiki, GIT, SVN, are all possible Access-Methods)
    • A api, to use the tool like Language-Tool. Later, after I have l
  4. Will be implemented in D, as soon I have the API ready and running, I would present it here, if there is interest for it.

I have no name for this project, in my head, I call it Sprachqualität Check in English Language quality check, something which could help users of manuskript, I implement it this way, because I will use it for all my writing projects!

Why I describe it here, as another probably, to help the users of manuskript to write better text, with it. The same time, when you rebuild Manuskript for GTK, to make it more easy, to integrate something like this into the software!

Regards

TheJackiMonster commented 1 year ago

@obw Maybe we can write an interface for Python once it's ready. I haven't used D yet but we could at least run a binary and pipe io in between to use it. I have nothing against mixing programming languages in the parts of Manuskript where we address problems in a modular way anyway.

TheJackiMonster commented 1 year ago

@TheShadowOfHassen I assume we need to write a custom interface for all GTK text widgets we want to handle which can track the actual text edited to only check such portions (including some unchanged context around for grammar tools). That would make everything vastly more efficient since spell & grammar check is currently quite a bottleneck in terms of performance.

TheShadowOfHassen commented 1 year ago

@TheShadowOfHassen I assume we need to write a custom interface for all GTK text widgets we want to handle which can track the actual text edited to only check such portions (including some unchanged context around for grammar tools). That would make everything vastly more efficient since spell & grammar check is currently quite a bottleneck in terms of performance.

I already have on demand checking for gtk.textview I don't think it's perfect though.

TheShadowOfHassen commented 11 months ago

So honestly I think the first point would be implementing the beginning of a plugin structure, and allow the spell check through solely through installed plugins. (Default languages would be pre-installed plugins.)

For simplicity is plugins alright if they're just a folder with the data, an init.py to load the plugin from and a .JSON file? For info, this doesn't have to be set in stone, but I want to be able to set the plugin system from the beginning.

Note: This is getting close to NaNoWriMo and while I'll try to work on this I'm sure people will understand if I try to write something this November.

I'll post images of a plugin mark up once I make it.

TheJackiMonster commented 11 months ago

I'd recommend using one of these options: https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/

Because if we use a standard way documented by Python itself, we can just point to that instead of writing own documentation for a custom plugin system.

obw commented 11 months ago

@TheJackiMonster: I think this would be a good thing, to use given standards. So it's easier for programmers to adapt the way to implement a plugin!

The other thing is the API for the plugins!

For something like the Spell check, we need an event based solution, I think at first we need event triggers for:

Later I can think of a lot more of useful Events. Also, this way, only the open Texts are checked, which bring us some performance back. When I remember correctly, one of the performance problems of Manuskript is, that we check on open of a project, all text spelling!

Later we, when the event interface is standing, we could / should implement more Events. Something like:

The other thing is, we could define Events which are running asynchronous, like onUpdateStats, but this is also something for the future, but we should think about it!

TheJackiMonster commented 11 months ago

I would argue that onTextOpen and onTextChanged is the same event though. The only difference is that onTextOpen might not provide old text data but in that case we could just pass None. So it should just be onTextChanged.

I would also opt for an event called onFormat instead of onSave. Because saving might be done via revisions, timers or manually depending on your settings. Formatting could be enforced differently by settings. So it's not smart to couple them in a plugin system.

Also something for events like onProjectOpen: There should also be an onProjectClosed. In case there's any blocking, allocation or locking of resources involved something like this is necessary. I would add events like onInitialize and onDispose which would be called when software starts and ends or the particular plugin gets enabled or disabled.

We should also address that any error, exception and similar issues by a plugin gets caught from the application and disables the plugin immediately with an additional message dialog about the cause for users to report it. It should be very clear from beginning that we won't maintain third-party plugins and stability should not get worse by trying out plugins.

Potentially we could even benchmark event calls from plugins to check whether they impact performance to warn the user about it. But I'm not sure whether we want to have the overhead for something like this.

obw commented 11 months ago

@TheJackiMonster:

I would argue that onTextOpen and onTextChanged is the same event though.

Are not the same events, in the Eye of the developer of a Plugin. onTextOpen is an Event which is only called seldom! And onTextChanged after every edit of the User. I think both should have the same payload, but not all possible Plugins would need to register for both Events. This granularity, will help to improve the performance!

The four Events which I have proposed, are the Events which are needed from my point of view, for the plugins / tools which we have talked here about!

Also something for events like onProjectOpen: There should also be an onProjectClosed. In case there's any blocking, allocation or locking of resources involved something like this is necessary. I would add events like onInitialize and onDispose which would be called when software starts and ends or the particular plugin gets enabled or disabled.

The first step is the implementation of an event handler, which the possibility to add observers. The next steps are, to implement the parts of Manuskript to use this new possibility, like the spell check. After this works correct and fast, we can think about a way to use this new way to extend Manuskript. Also, we should think about something like a Plugin-Repository! With this, also a way to make sure that no Plugins which are harmful for the user!

But all of this, is something for later, after we have established the needed Interfaces and first implementations!

What also need, are defined API Parts, to add POPUP-Menu Entries, Menu Entries, and additional views or Tools like the Cheat Sheet. But this is also something we should implement later, but also something which is to think forward it!

First steps, first! Define and write an event handler, then define and implement an observer Interface!

We should think about the Observer / Listener Args, which Date should be provided, also in which way. The problem is always, security and performance. The Open-Source Tools with good Event support I knew of, are:

I can help with the planning, but not with the implementing, of all of this!

TheShadowOfHassen commented 11 months ago

I'll use the python recommended way. I'm going to start plugin work with something strange but also easy. A plugin that adds another page to the Manuskript tab setup, after that it's to spell check.

Screenshot from 2023-10-11 09-08-20

This is what rhytmbox's plugin field looks like. It's a standard for Gtk apps, so I think I'll just do something similar. Just, instead of being in a tab of settings, it'll be in its own window.

TheJackiMonster commented 11 months ago

It's a standard for Gtk apps, so I think I'll just do something similar. Just, instead of being in a tab of settings, it'll be in its own window.

Why would you first refer to the standard way of Gtk apps and then decide to do it differently Wouldn't it make more sense to use the standard way making it easier to find for users?

TheShadowOfHassen commented 11 months ago

It's a standard for Gtk apps, so I think I'll just do something similar. Just, instead of being in a tab of settings, it'll be in its own window.

Some GTK apps have the plugin in a separate window. It's the standard for layout look.

TheShadowOfHassen commented 11 months ago

I've been looking into the methods described in the link you gave me @TheJackiMonster (https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins) and it looks like most of them are used for system or pypi level plugins which I'm not convinced we should use. It could be that I'm just confused, however, some explenation would be useful.

TheShadowOfHassen commented 9 months ago

@TheJackiMonster. The method you suggested looks like it's more for system/ pypi level plugins. I'm not sure thats what is best for the project. However I could just be confused. Am I misreading it?

TheJackiMonster commented 9 months ago

I don't understand the question. How else would plugins work?

  1. You install Manuskript to your system.
  2. You install a plugin to your system.
  3. Manuskript loads the plugin.

If you don't want to install anything system-wide, you create a custom virtual environment, install the plugin in there and start Manuskript from within the virtual environment.

TheShadowOfHassen commented 9 months ago

Most of the programs with plugins (Like Rhythmnbox) have a .manuskript/plugins in the local folder, which it checks for plugins, and you can enable in the plugin page. I like this way, its simpler to code and it'd be easier for plugin developers.

However, the way you suggested, @TheJackiMonster, it looks like it checks the modules you install (such as pip or through a package manager) And I'm not sure that's the best way to do it.

I want to know if I understood the documentation, and I'm also curious to know why you think that way's better.

TheJackiMonster commented 9 months ago

Well, as you described. You can easily install the plugins via package management or pip. So there's an easy way for distribution. Most developers will have no issues adopting that mechanism because it's pretty much common standard. Everything required for loading the plugins is already implemented into Python and users don't need to look for hidden-by-default directories to move plugin archives or folders manually.

I don't know but this sounds a lot better to me.

TheShadowOfHassen commented 9 months ago

However, it would also be easy to include a module installer with manuskript, and just point it to GitHub repos. Also with using pip, how would Windows users running the executable install plugins?

My biggest reason is that while other people might have no problems, however handling python packages is something I am in no way familiar and trying to learn it while implementing a plugin system sounds something a little over my head.

TheJackiMonster commented 9 months ago

Also with using pip, how would Windows users running the executable install plugins?

For example with pip. There are also multiple GUIs for pip, specifically made for Windows.

Also how would it be trivial to include a custom module installer while doing the same with existing python package structure would not be? Reinventing the wheel is not necessary here and I don't think it would be any easier as well.

TheShadowOfHassen commented 9 months ago

For example with pip. There are also multiple GUIs for pip, specifically made for Windows.

Does the self-contained .exes for manuskript come with pip?

TheJackiMonster commented 9 months ago

If that would be necessary.

TheShadowOfHassen commented 9 months ago

And the packages would be installed in the standard module install? Is that standard for python programs? I haven't seen any that do it that way, and I'm not sure, it would be best. What if we want to use different languages for a plugin?

I do think that a custom module set up would be easier, I already have most of the work done, however if you think it's better the other way it's finer. I'll try to help, but you'll have to do most of the work. I don't know enough about pythons packaging systems for me to feel comfortable spearheading an implementation. However, once it's set up, I'll write the module for spell check.

TheJackiMonster commented 9 months ago

Pretty much most python programs install via their own package management (for example pip) or external package management (from the operating system as most Linux distributions handle that). As far as I know macOS supports Python and using pip officially too already (so users wouldn't need homebrew for example) and for Windows there are multiple options (but in worst case WSL works too).

If you want to support multiple languages, we shouldn't build on any import-logic of script files from Python anyway. But I would recommend that developers can write their plugin API in Python and handle communication or bindings to a different language in there. Because that should be straight forward and it's easier to support long term.

TheShadowOfHassen commented 9 months ago

Alright, so if pypi it is, then you'll at least have to work on the core importing. I can let you have my UI work already if you want.

TheJackiMonster commented 8 months ago

@TheShadowOfHassen I've now implemented the code necessary to load plugins via namespace. So all plugins need to be a module under the namespace manuskript.plugin to be loaded initially. Then they can be implemented via a Plugin class in that module from which an instance will be created during loading.

I've not added API calls yet. But they can be added to the AbstractPlugin class as methods. Because all Plugin classes need to inherit from that.

By the way it works for plugins installed system-wide in locations like /usr/lib/python3.X/site-packages/, user-wide in ~/.local/lib/python3.X/site-packages/ or even locally in the directory of the application. For example I've added a TestPlugin in the commit to the repository which works as well.

Edit: However unsure whether loading of Python code will work dynamically with a Windows binary from PyInstaller. So that might still need testing. Otherwise we just need to include plugins for Windows.

TheShadowOfHassen commented 8 months ago

@TheShadowOfHassen I've now implemented the code necessary to load plugins via namespace. So all plugins need to be a module under the namespace manuskript.plugin to be loaded initially. Then they can be implemented via a Plugin class in that module from which an instance will be created during loading.

I've not added API calls yet. But they can be added to the AbstractPlugin class as methods. Because all Plugin classes need to inherit from that.

By the way it works for plugins installed system-wide in locations like /usr/lib/python3.X/site-packages/, user-wide in ~/.local/lib/python3.X/site-packages/ or even locally in the directory of the application. For example I've added a TestPlugin in the commit to the repository which works as well.

Edit: However unsure whether loading of Python code will work dynamically with a Windows binary from PyInstaller. So that might still need testing. Otherwise we just need to include plugins for Windows.

Thanks @TheJackiMonster It looks exactly like what's need. When I have time, I'll be able to start adding some API calls. Not spell check immediately, but some simple things.

TheJackiMonster commented 8 months ago

It probably makes sense we add a mechanism for plugins to subscribe to specific events. Then they can decide on their own whether they bring in a spellchecker, exporter, importer or something else. If we have things like a spellchecker which we definitely want to have as plugins, we can add simplified events for those. So the plugins for such use case can be easier to implement.

TheShadowOfHassen commented 8 months ago

So I started making a random plugin here to start exposing some API: https://github.com/TheShadowOfHassen/manuskript/tree/plugin-gtk

But for some reason, manuskript won't load the new plugin. Any ideas @TheJackiMonster ?

TheJackiMonster commented 8 months ago

@TheShadowOfHassen The import in the __init__.py of the plugin is wrong. It should be: manuskript.plugin.guess_number.guessNumber. Also the plugin class needs to be aliased as "Plugin" in the __init__.py.

I should probably put in some debug output for the exception handling when trying to import such plugins. My thought was that this is output only a logger should see but currently there are no users for the Gtk port anyway. ^^'

TheShadowOfHassen commented 8 months ago

@TheShadowOfHassen The import in the __init__.py of the plugin is wrong. It should be: manuskript.plugin.guess_number.guessNumber. Also the plugin class needs to be aliased as "Plugin" in the __init__.py.

I should probably put in some debug output for the exception handling when trying to import such plugins. My thought was that this is output only a logger should see but currently there are no users for the Gtk port anyway. ^^'

I might not have understood the message, but I did what you said, and it doesn't look like it's loading. My print debug in the init.py isn't showing up.

TheJackiMonster commented 8 months ago

You can merge the change for debug output.

TheShadowOfHassen commented 8 months ago

That helped a ton!

I have now created a guess the number game plugin right here: https://github.com/TheShadowOfHassen/manuskript/tree/plugin-gtk

In it, I've worked on a way we can add UI elements. The plugin kinds are read when setting up the main window and are added to the stack.

Something similar could be added to let them be a pop-up window called in the drop-down menus (like frequency analyzer.) I also set up the UI for a plugin manager. None of the buttons do anything. I figured those could wait until the rest of the settings are implemented (so we can save which plugins are enabled or not)

I think the branch should be merged so we have the UI and the other additions, but the plugin probably shouldn't be shipped when the Gtk port is finished, in the meantime, though, it will be useful to help test other plugin features. (I want to add high score when we add saving)

I'll work on other plugins, however I don't think I can add more API until parts of the manuksript's features are added, like saving and loading + a working editor.

Also, @TheJackiMonster do you think I should keep a wiki page with the plugin API? I think it would be smart to write all the functions out so plugin developers don't have to search the code to see how it works.

TheJackiMonster commented 8 months ago

I think adding elements to the UI needs to have far more regulation in plugins. Because simply adding widgets to the main application with direct access can not be reversed and neither is the order reproducible. For example you could have menus or toolbars with a specific order for your needs, training muscle memory. But after you add a new plugin, it gets inserted somewhere in between because the order of plugins loading might differ, killing your muscle memory completely.

Also if plugins have direct access to the main UI they could also remove or change things. That's not automatically bad but in that case it needs to be reversible at least. So that users can easily toggle plugins on and off, deciding what to use. Dragging script files in directories around while restarting the application over and over again should not be the way to do it.

obw commented 8 months ago

@TheJackiMonster you are absolute right!

We need more than a simple Plugin-Loading mechanism.

Clear defined and documented APIs. I have written things like that (in PHP).

Config

Init

Events (https://en.wikipedia.org/wiki/Event_(computing))

I think more will be useful later, but also this collection should be enough for the start.

We also must think how the EventClass and ObserverClass should look like. The other thing is, later for the Plugin developers, how to make it that everything runs smoothly(asynchron and so on!)!

I also think, this will be as useful, as good the documentation is!

regards

TheShadowOfHassen commented 8 months ago

I think adding elements to the UI needs to have far more regulation in plugins. Because simply adding widgets to the main application with direct access can not be reversed and neither is the order reproducible. For example you could have menus or toolbars with a specific order for your needs, training muscle memory. But after you add a new plugin, it gets inserted somewhere in between because the order of plugins loading might differ, killing your muscle memory completely.

Also if plugins have direct access to the main UI they could also remove or change things. That's not automatically bad but in that case it needs to be reversible at least. So that users can easily toggle plugins on and off, deciding what to use. Dragging script files in directories around while restarting the application over and over again should not be the way to do it.

Both problems can be easily fixed once the other settings are up and working. I can just add new settings to change the order of the stack items and allow people to toggle plugins on and off (which I was planning on doing, anyway)

@obw I'm starting to work on an API, however I'm starting with writing plugins with complete control, and then I'll abstract and sandbox parts as I go.

obw commented 8 months ago

@TheJackiMonster wie man in unserer Muttersprache so schön sagt: Mit Python stehe ich, warum auch immer, auf Kriegsfuß!

I think you will make an useable solution! I'm sorry that I only can give advice, from my work with PHP based projects.

And @TheJackiMonster thanks for all your hard work!

TheShadowOfHassen commented 8 months ago

Update: I added a bit of code that will default the plugin to add the stack at the very end. And I abstracted the entire "New Stack" to a "New stack plugin. When we get settings to work I'll add a pane that allows people to change the order of their stacks.