protesilaos / denote

Simple notes for Emacs with an efficient file-naming scheme
https://protesilaos.com/emacs/denote
GNU General Public License v3.0
467 stars 50 forks source link

Sluggify accented titles #328

Open maikol-solis opened 2 months ago

maikol-solis commented 2 months ago

Hi, I just want to share this snippet of code if someone finds it useful. I just copied-pasted the same function from org-roam (https://github.com/org-roam/org-roam/blob/8667e441876cd2583fbf7282a65796ea149f0e5f/org-roam-node.el#L170)

It changes titles like

"Un título en español"

to

un-titulo-en-espanol

This is the code.

(defun denote--slug-accents (text)
  "Return the slug of NODE."
  (let ((slug-trim-chars '(;; Combining Diacritical Marks https://www.unicode.org/charts/PDF/U0300.pdf
                           768 ; U+0300 COMBINING GRAVE ACCENT
                           769 ; U+0301 COMBINING ACUTE ACCENT
                           770 ; U+0302 COMBINING CIRCUMFLEX ACCENT
                           771 ; U+0303 COMBINING TILDE
                           772 ; U+0304 COMBINING MACRON
                           774 ; U+0306 COMBINING BREVE
                           775 ; U+0307 COMBINING DOT ABOVE
                           776 ; U+0308 COMBINING DIAERESIS
                           777 ; U+0309 COMBINING HOOK ABOVE
                           778 ; U+030A COMBINING RING ABOVE
                           779 ; U+030B COMBINING DOUBLE ACUTE ACCENT
                           780 ; U+030C COMBINING CARON
                           795 ; U+031B COMBINING HORN
                           803 ; U+0323 COMBINING DOT BELOW
                           804 ; U+0324 COMBINING DIAERESIS BELOW
                           805 ; U+0325 COMBINING RING BELOW
                           807 ; U+0327 COMBINING CEDILLA
                           813 ; U+032D COMBINING CIRCUMFLEX ACCENT BELOW
                           814 ; U+032E COMBINING BREVE BELOW
                           816 ; U+0330 COMBINING TILDE BELOW
                           817 ; U+0331 COMBINING MACRON BELOW
                           )))
    (cl-flet* ((nonspacing-mark-p (char) (memq char slug-trim-chars))
               (strip-nonspacing-marks (s) (string-glyph-compose
                                            (apply #'string
                                                   (seq-remove #'nonspacing-mark-p
                                                               (string-glyph-decompose s)))))
               (cl-replace (text pair) (replace-regexp-in-string (car pair) (cdr pair) text)))
      (let* ((pairs `(("[^[:alnum:][:digit:]]" . "-") ;; convert anything not alphanumeric
                      ("--*" . "-")                   ;; remove sequential underscores
                      ("^-" . "")                     ;; remove starting underscore
                      ("-$" . "")))                   ;; remove ending underscore
             (slug (-reduce-from #'cl-replace (strip-nonspacing-marks text) pairs)))
        (downcase slug)))))

(defun my/denote-sluggify-title (str)
  "Make STR an appropriate slug for title."
  (denote--slug-accents (downcase (denote--slug-hyphenate (denote--slug-no-punct str)))))

(setq denote-file-name-slug-functions
      '((title . my/denote-sluggify-title)
        (signature . denote-sluggify-signature)
        (keywords . denote-sluggify-keywords)))
protesilaos commented 2 months ago

Thank you @maikol-solis! I think this is worth experimenting with. Maybe we want to remove accents from the slug, though I wonder what sort of edge cases we might encounter.

If there are others who find this useful, we can include it in the Denote manual.

nobiot commented 2 months ago

For what it's worth, a relevant discussion happened a while ago in SourceHut mailing list with some code change suggestion, here.

I do not know if the code suggested then is still applicable -- there seems to have been a lot of refactoring around file name creation.

Just for your information.

maikol-solis commented 2 months ago

Thank you @maikol-solis! I think this is worth experimenting with. Maybe we want to remove accents from the slug, though I wonder what sort of edge cases we might encounter.

This function has been in the org-roam repo for a while, so maybe some of those edge cases are already fixed or at least reported.

If there are others who find this useful, we can include it in the Denote manual.

It's a good start.

The main reason I use this function is to have some consistency when I write Spanish titles or keywords. It could be helpful for people who write in other Latin languages, such as French, Italian, Portuguese, etc.

maikol-solis commented 2 months ago

For what it's worth, a relevant discussion happened a while ago in SourceHut mailing list with some code change suggestion, here.

I do not know if the code suggested then is still applicable -- there seems to have been a lot of refactoring around file name creation.

Thanks for the pointer.