minad / cape

🦸cape.el - Completion At Point Extensions
GNU General Public License v3.0
603 stars 22 forks source link

cape-char.el: Better implementation of translation hash function #91

Closed hedyhli closed 1 year ago

hedyhli commented 1 year ago

My solution for getting a translation hash without parsing the help output, using quail, which (partly) closes #90.

Related issues and PRs: #61 #73

Implementation

Changes outline

Notes

toggle
  • ~~The `regexp` argument has been removed since I don't see any difference in the result -- seems to work during initial testing~~ (reverted in [`8a58051`](https://github.com/minad/cape/pull/91/commits/8a5805143e5b84855b7be9b196224e657c8b90b9) to keep hash table filtered and its size small)

  • `(if (string= (char-to-string value-char) value-str)`: `string-width` exists, and it's probably correct to say that all "characters" that lose its original string form when converted from string to char and back to string, have string-width of 2. However there are strings that satisfy the latter, but still retain the original string when converted into char and back to string. Hence I decided to use the more verbose check for explicitness

  • `quail-build-decode-map`, the function that does the quintessential work in this whole PR, does accept the "KEY" argument, which actually applies filtering so the resulting decode-map only includes those with names beginning with KEY. (ie: you can pass KEY as ":" for emoji and ones with the name not beginning with ":" would be excluded.) However I've decided to stick to the original regexp argument, since we're looping over each name-value cons cell anyway, and `string-match` isn't that slow (plus the translation function is only called ones during initial definition of each char capf). Furthermore using regexp allows extensions to the functionality in the future where we wouldn't be only limited to prefixes, if ever.

  • ~~Using `(nth 2 quail-current-package)` rather than `(quail-map)` (the two do the same thing)~~ ~~This probably has the disadvantage of having maintenance if quail ever changes their format of `quail-current-package` data, but 1) its format seems unlikely to change (I think!), and 2) it avoids another reference to a symbol that the warning may say is undefined (again, not a strong argument)~~ Now using `(quail-map)`, see [`884ad0d`](https://github.com/minad/cape/pull/91/commits/884ad0d83c3190f579f7b9570a9a85f8ab691cba)

  • There is (I think) a kind of unfixable effect of allowing both strings and chars in the hash table (especially due to some "characters" not being able to be represented as a character type). In the `cape-char--define` macro where the `*docsig` function is defined, we use functions that act strictly on characters to fetch meta information on the character of completion candidates. If the value from gethash is a string, then it can't be fully represented as a character, so one could say the resulting `get-char-code-property` is incorrect? I did a quick web-search and couldn't find a suitable solution in getting char code property for these special "characters" ```elisp (get-char-code-property (string-to-char "🇦🇨") 'name) ; => "REGIONAL INDICATOR SYMBOL A" (char-code-property-description 'general-category (get-char-code-property (string-to-char "🇦🇨") 'general-category)) ; => "Symbol, Other" ``` I think it's somewhat informative 🤷

Caveats/Limitations

TODO

Demo

toggle
emacs 29: image
emacs 28: image
emacs 29: image
emacs 28: With autoloaded `cape-char--define` and user's custom `(cape-char--define TeX "TeX" ?\\)` call in the config Screenshot 2023-09-12 at 20 18 37
emacs 29: ```elisp ;; With "math-symbol-lists" (cape-char--define math "math" ?\\) (add-to-list 'completion-at-point-functions #'cape-math) ``` image
hedyhli commented 1 year ago

Oops, just realized putting unrelated commits in a single PR is a bad idea

minad commented 1 year ago

Oops, just realized putting unrelated commits in a single PR is a bad idea

Yes, please restrict the change to the introduction of the quail-based macro. I am sorry for interfering with your work about the emojis, but I wanted to quickly try if it already works with the current mechanism.

Please note https://github.com/minad/cape#contributions. Have you signed the FSF copyright papers or are you willing to do so?

minad commented 1 year ago

Thanks, this looks really good! I am happy to merge this if copyright permits. I'll add a few remarks to the code.

hedyhli commented 1 year ago

For cape-emoji, the package fetching from minad:main does not seem to have support for :flag-*: emoji.

After implementing my change for the first TODO item, the flag emoji are now available in the completion menu, but when accepted they do not get expanded to the correct flag emoji.

toggle screenshot image

It seems like in elisp converting to-and-fro char and string for these flag emoji is losing information:

(char-to-string (string-to-char "🇦🇨"))
; => "🇦"

Possible solutions:

  1. Store strings instead of chars in the hash table. Update cape-char--define to not convert char to string, but in generating the "docsig" convert string to char to use char-code-property functions. this could possibly fix the issue shown in the screenshot, but docsig might not work.
  2. Proceed as-is: expanding :flag-ac: and get "🇦" (they also won't show up correctly the "annotation" part of completion menu)
  3. Still implement this but don't put values in hash table if (char-to-string (string-to-char value)) != value: this would mean no support for :flag-*: but who knows maybe some other input-method decode-map has items with ["something"] and that something can be supported
  4. Don't implement this at all: ignore all key-value pairs where the value is a vector. This would keep the behaviour of cape-emoji not having :flag-*: items. (as well as any other input-method that possibly also return a vector of 1 item)

Thoughts?


Please note https://github.com/minad/cape#contributions. Have you signed the FSF copyright papers or are you willing to do so?

I'll send the email later today, thanks!

minad commented 1 year ago

Emojis are not represented as single characters but as multiple characters, which are combined together. I suggest we continue to store characters as integers in the hash table when the string-to-char conversion is not losing information, in order to keep the hash table small. Otherwise we store strings.

hedyhli commented 1 year ago

Suggested changes from previous review and code related TODOs have been implemented, with updated screenshot. I'll be marking PR as ready for review after testing it as a local package some time tomorrow.

hedyhli commented 1 year ago

Doesn't work unless I remove the eval-when-compile "Symbol's function definition is void". Not sure how it's possible that the function becomes undefined when I've only changed the function body... Otherwise (without eval-when-compile) I've tested it to work perfectly on my machine(™️) (fixed in 2945693, see commit message), save for a few warnings on quail-* symbols undefined (884ad0d).

If I don't add the completion functions to completion-at-point-functions list, running them interactively gives a list of completions, but the completions are completely detached from my keyboard events and I can't close the popup unless I run it interactively again. As with above this was already an issue before this PR, and it (as well as the problem from above) is probably due to my setup, which I'll investigate afterwards if this PR works for you.

Reviews appreciated!

minad commented 1 year ago

Thanks, I will give this a thorough test soon. Can you please let me know when you completed the FSF copyright assignment, such that I know when merging is possible.

minad commented 1 year ago

On 9/12/23 01:59, hedyhli wrote:

Indeed they are very short, but since they're also needed in the translation hash function, the number of references would be 2 and 3 respectively. See while I was writing the code I found having to do |if (stringp (char-to-string ...) ...| verbose and repetitive. So I extracted it to functions to make the code more concise (easier to read), and the changes to |cape-char--define| minimal.

Yes, I prefer to have them inlined, since we end up with more concise byte code. You don't have to bother with such unimportant details though

Generally I like the style of having separate and reusable utilities - I believe the Cape code base shows that. So it is a bit of a judgement call if having something separate is justified.

Did you consider contributing ensure-string and ensure-character upstream to subr.el? It would be a nice addition in symmetry with ensure-list. But I am not sure if the string<->char conversion occurs often in the Emacs code base. If these functions were in subr.el I would certainly want to use them via a Compat backport. :)

But sure I can replace all references for these to the |if| full form if you insist. Otherwise since we're going to remove |eval-when-compile| I guess they should stay as functions rather than macros to be more useful as public functions.

I wouldn't want to make all the helper functions public. I think we may only end up with cape-char--define as a public macro. But for me, having something public is not really important. Right now the macro cannot even be used with the private name due to the eval-when-compile. If we remove that, which is possible because of your vastly simpler implementation, the macro can be used, which is a significant improvement over the status quo. And on top we get the handling of multi-character symbols - I am really looking forward to having this here!

minad commented 1 year ago

On 9/12/23 02:30, hedyhli wrote:

Sorry I don't fully understand. Is this what you have in mind?

  • (let ((capf (intern (format "cape-%s" name)))
  • (let* ((capf (intern (format "cape-%s" name))) (prefix-required (intern (format "cape-%s-prefix-required" name)))
  • (hash (intern (format "cape--%s-hash" name))) (ann (intern (format "cape--%s-annotation" name))) (docsig (intern (format "cape--%s-docsig" name))) (exit (intern (format "cape--%s-exit" name))) (properties (intern (format "cape--%s-properties" name)))
  • (thing-re (concat (regexp-opt (mapcar #'char-to-string prefix)) "[^ \n\t]*" )))
  • (thing-re (concat (regexp-opt (mapcar #'char-to-string prefix)) "[^ \n\t]*" ))
  • (hash (cape-char--translation-hash
  • method
  • (concat "\" (regexp-opt (mapcar #'char-to-string prefix)))))) (progn
  • (defvar ,hash (cape-char--translation-hash
  • ,method
  • ,(concat "\`" (regexp-opt (mapcar #'char-to-string prefix)))))

Not exactly, but close. The idea is to have both a hash-var symbol and a hash-val hash which is then stored as (defvar ,hash-var ,hash-val "...").

It works but the hash wouldn't be publicly defined, and references to it later on in other functions in|progn| would be referencing a fully expanded form of the hash, right? ... Which is extremely long?

Yes, this would lead to an unacceptable duplication of the literal hash. I think in recent times the bytecompiler got more intelligent to deduplicate literals, but of course we shouldn't do this.

hedyhli commented 1 year ago

Yes, I prefer to have them inlined, since we end up with more concise byte code. [...]

You do have a point with the resulting byte code! I've applied the change and it turned out to be less verbose than I anticipated 😅

Did you consider contributing ensure-string and ensure-character upstream to subr.el? It would be a nice addition in symmetry with ensure-list. But I am not sure if the string<->char conversion occurs often in the Emacs code base. If these functions were in subr.el I would certainly want to use them via a Compat backport. :)

Good suggestion. I don't think I'm a heavy-enough user for it but I'll consider it when I observe a growing need.


Not exactly, but close. The idea is to have both a hash-var symbol and a hash-val hash which is then stored as (defvar ,hash-var ,hash-val "...").

I hope my next commit is what you're suggesting (because I'm not entire sure); If it isn't I'll force-push the change.

hedyhli commented 1 year ago

A note regarding having cape-char--define public (see last two screenshots in PR desc):

At the moment if I was to use cape-char--define for my own quail package by including it in the :config of use-package cape, at the time of loading the macro doesn't seem to be defined yet.

Just to clarify, this PR is only about the translation hash function; making sure the user can define their own cape-char capfs is out of scope right? Should another autoload be added?

minad commented 1 year ago

Just to clarify, this PR is only about the translation hash function

Yes, let's finish this first. I think the only thing missing here for the merge is your go regarding the FSF assignment.

hedyhli commented 1 year ago

I sent the email 3 days ago (and haven't heard back yet)

minad commented 1 year ago

On 9/13/23 09:25, hedyhli wrote:

I sent the email 3 days ago (and haven't heard back yet)

Any news on the assignment?

hedyhli commented 1 year ago

I've emailed them back the signed document around two days ago and currently waiting on their part.

minad commented 1 year ago

You haven't heard back so far? Maybe write a gentle reminder and ask if your mail got lost? Thanks!

hedyhli commented 1 year ago

I haven't indeed 🤔, I've just sent a follow-up email

hedyhli commented 1 year ago

Was just informed that the assignment is now complete :)

minad commented 11 months ago

I forgot to say thanks for the addition. Thank you! The quail support is much better than before. Does the given functionality work well enough for you or do we have to add something more? Regarding #61, cape-char--define should be available now such that it can also be used in user configurations to define custom char Capfs.