protesilaos / denote

Simple notes for Emacs with an efficient file-naming scheme
https://protesilaos.com/emacs/denote
GNU General Public License v3.0
529 stars 54 forks source link

Only use ASCII in file names #420

Closed protesilaos closed 1 month ago

protesilaos commented 1 month ago

I just created a new branch that introduces the user option denote-only-ascii-in-file-names. When it is set to a non-nil value, it makes Denote only keep ASCII characters in file names. Here is the idea:

(let ((denote-only-ascii-in-file-names nil))
  (denote--slug-no-punct "There are no-ASCII : characters | here"))
;; => "There are no-ASCII : characters | here"

(let ((denote-only-ascii-in-file-names t))
  (denote--slug-no-punct "There are no-ASCII : characters | here"))
;; => "There are no-ASCII   characters   here"

I have used this locally for a while, but I am concerned it may be too aggressive of a default. So the current value of the user option is nil.

If anybody tries this, let me know what you think.

jeanphilippegg commented 1 month ago

I think the idea is good!

However, I think we should encourage users to stick to denote-file-name-slug-functions to customize their sluggification. Also, some users may be used to having accents in their file names. So it is good that it is not active by default.

As such, I would keep your function denote--slug-keep-only-ascii and mention that it is available to those that want to use it in their sluggification (with denote-file-name-slug-functions).


To add details to the answer above:

I would stick to denote-file-name-slug-functions as the only way to customize the sluggification. It may not bring much complexity to introduce one new user option that modifies the default sluggification. However, there are many ways a user may want to customize the sluggification.

Examples:

I think it is good to provide functions that do the above if they do not already exist in Emacs. And we should have a good default sluggification. But beyond that, I would let the user customize denote-file-name-slug-functions. Adding user options to configure the default for all features above may be confusing.

So we would keep denote--slug-keep-only-ascii and a user can use it in the same way as denote--slug-no-punct, denote--slug-hyphenate and downcase. The default denote-sluggify-title is a good example of how to compose them to get the desired sluggification.

Then, we can eventually also add denote--slug-camelcase, denote--slug-remove-accents, denote--slug-remove-spaces, etc.

What do you say?

protesilaos commented 1 month ago

As such, I would keep your function denote--slug-keep-only-ascii and mention that it is available to those that want to use it in their sluggification (with denote-file-name-slug-functions).

Yes, this is better. I will update it accordingly and then merge it into main.

Then, we can eventually also add denote--slug-camelcase, denote--slug-remove-accents, denote--slug-remove-spaces, etc.

Sure! Those should all be public functions that we will document in the manual as well. Plus examples of how to use them.

jeanphilippegg commented 1 month ago

You seem to have mixed denote-sluggify-title with denote--slug-no-punct in the manual.

Here is how a user would use the new denote-slug-keep-only-ascii:

(defun my/denote-sluggify-title (str)
  (downcase (denote--slug-hyphenate (denote--slug-no-punct (denote-slug-keep-only-ascii str)))))

(defun my/denote-sluggify-signature (str)
  (downcase (denote--slug-put-equals (denote--slug-no-punct-for-signature (denote-slug-keep-only-ascii str) "-+"))))

(defun my/denote-sluggify-keyword (str)
  (downcase
   (replace-regexp-in-string
    "-" ""
    (denote--slug-hyphenate (denote--slug-no-punct (denote-slug-keep-only-ascii str))))))

(defcustom denote-file-name-slug-functions
  '((title . my/denote-sluggify-title)
    (signature . my/denote-sluggify-signature)
    (keyword . my/denote-sluggify-keyword)))

To have a simpler example, you could also customize only the title sluggification.

Another good example of how to customize denote-file-name-slug-functions is shown in #328.


Also, you added a comment over denote--slug-remove-dot-characters, denote--replace-consecutive-tokens and denote--trim-right-token-characters to make them public after 3.1.0, but they should not be used by users. We execute them unconditionally to enforce the hard rules of files names and users cannot opt out of them. We can make them public, but it is not necessary and users should not use them.

protesilaos commented 1 month ago

From: Jean-Philippe Gagné Guay @.***> Date: Tue, 3 Sep 2024 17:49:31 -0700

You seem to have mixed denote-sluggify-title with denote--slug-no-punct in the manual.

Oh, I most definitely did!

Here is how a user would use the new denote-slug-keep-only-ascii:

(defun my/denote-sluggify-title (str)
  (downcase (denote--slug-hyphenate (denote--slug-no-punct (denote-slug-keep-only-ascii str)))))

(defun my/denote-sluggify-signature (str)
  (downcase (denote--slug-put-equals (denote--slug-no-punct-for-signature (denote-slug-keep-only-ascii str) "-+"))))

(defun my/denote-sluggify-keyword (str)
  (downcase
   (replace-regexp-in-string
    "-" ""
    (denote--slug-hyphenate (denote--slug-no-punct (denote-slug-keep-only-ascii str))))))

(defcustom denote-file-name-slug-functions
  '((title . my/denote-sluggify-title)
    (signature . my/denote-sluggify-signature)
    (keyword . my/denote-sluggify-keyword)))

I added this, thank you!

To have a simpler example, you could also customize only the title sluggification.

I think it is okay to have it for everything, so that people can just copy-paste. Eventually we can have sections like this one with other common variations.

Another good example of how to customize `` is shown in #328.

Yes, this is something worth checking as well. We can have a function to deal with accents in denote.el and then users will write their functions on top.

Also, you added a comment over denote--slug-remove-dot-characters, denote--replace-consecutive-tokens and denote--trim-right-token-characters to make them public after 3.1.0, but they should not be used by users. We execute them unconditionally to enforce the hard rules of files names and users cannot opt out of them. We can make them public, but it is not necessary and users should not use them.

I just commented them all to make the area of work easier to spot. Though, yes, we only want to expose what users will be using.

-- Protesilaos Stavrou https://protesilaos.com

jeanphilippegg commented 1 month ago

Fine with me!

Since we are at it, I think there are a few things we can get rid of if we want to expose this to users.

Instead of two "complex" denote--slug-no-punct and denote--slug-no-punct-for-signature, we would end up with a single one-liner denote--slug-no-punct. This would be more manageable for a user to customize! And this would be done without changing the current sluggification.

(I believe I am the one who introduced denote--slug-no-punct-for-signature in the first place. If I remember correctly, I separated it from denote--slug-no-punct because it simplified the code at that moment and it was an internal function anyway. But now, we want to expose them to users and we have denote-file-name-slug-functions to customize the sluggification.)

I could make a pull request to show you how this can be achieved over the next weekend. Do you want me to wait, since 3.1.0 has just been released?

protesilaos commented 1 month ago

From: Jean-Philippe Gagné Guay @.***> Date: Wed, 4 Sep 2024 19:36:05 -0700

[... 3 lines elided]

Since we are at it, I think there are a few things we can get rid of if we want to expose this to users.

  • Remove denote--slug-no-punc-for-signature.
  • Deprecate denote-excluded-punctuation-extra-regexp that is currently modifying the default sluggification. Users would customize denote-file-name-slug-functions instead.
  • Remove the extra-characters parameter of denote--slug-no-punct. We do not use it.

Instead of two "complex" denote--slug-no-punct and denote--slug-no-punct-for-signature, we would end up with a single one-liner denote--slug-no-punct. This would be more manageable for a user to customize! And this would be done without changing the current sluggification.

(I believe I am the one who introduced denote--slug-no-punct-for-signature in the first place. If I remember correctly, I separated it from denote--slug-no-punct because it simplified the code at that moment and it was an internal function anyway. But now, we want to expose them to users and we have denote-file-name-slug-functions to customize the sluggification.)

Sounds good!

I could make a pull request to show you how this can be achieved over the next weekend. Do you want me to wait, since 3.1.0 has just been released?

Let's wait for a couple of weeks, just to be sure. I think things are stable but there be something we missed.

-- Protesilaos Stavrou https://protesilaos.com

protesilaos commented 1 month ago

We are done with the ASCII part. Let's check the rest in the coming week. I think there are no major issues with the latest release.