protesilaos / denote

Simple notes for Emacs with an efficient file-naming scheme
https://protesilaos.com/emacs/denote
GNU General Public License v3.0
494 stars 54 forks source link

Title as first element causes shell problems due to leading `--` #396

Open mankoff opened 1 month ago

mankoff commented 1 month ago

I'd like to use this (setq denote-file-name-components-order '(title signature keywords)) as my file name scheme. I note that if identifier is not included or not first, it adds @@ as a field separator, but does not include this if it is the first field.

I suggest when title is the first field, it should also drop the -- separator. Currently, files are named, for example, --foo@@20240717T222108.org, which is a problem for a lot of bash shell commands.

Is it possible to remove the leading -- ?

mankoff commented 1 month ago

I note a comment here https://github.com/protesilaos/denote/issues/361#issuecomment-2119237094 says

I don't see a way around this. If you have a file "some-file@@20240505T111111.org", is "some-file" a title, a keyword or a signature? We could allow one of the component to drop its delimiter if it is in the first position, but this cannot be done for all components at the same time.

But it seems like you could determine what 'some-file' is from the variable denote-file-name-components-order. But even without that introspection, I think I'm asking if the feature suggested above, drop delimiter from first position, exists.

mankoff commented 1 month ago

And one more comment. in #332 all of the examples of title-first have no leading --. It makes me think I'm doing something wrong, but have not found mention of this in the manual.

protesilaos commented 1 month ago

From: Ken Mankoff @.***> Date: Wed, 17 Jul 2024 22:24:53 -0700

I'd like to use this (setq denote-file-name-components-order '(title signature keywords)) as my file name scheme. I note that if identifier is not included or not first, it adds @@ as a field separator, but does not include this if it is the first field.

Indeed, the identifier gets a delimiter when it is not the first component. We did this to still make it possible to find it directly.

I suggest when title is the first field, it should also drop the -- separator. Currently, files are named, for example, --foo@@20240717T222108.org, which is a problem for a lot of bash shell commands.

Does it work if you quote the file names? I guess it can still be tricky.

Is it possible to remove the leading -- ?

We had discussed this during the development of this feature. I think it makes sense, though we have to consider how best to do it. What if someone wants the signature to be first: should it also drop its delimiter? And, if so, how do we disambiguate them in a scenario like this:

SIGNATURE@@DATE.org
TITLE@@DATE.org

Inside of Emacs we can rely on the denote-file-name-components-order to determine the current preference, but even then we cannot know what the previous preference was and if any files were created using that one.

For shell scripts, this get trickier because the file name alone does not tell us what the component is and then we need some other heuristic.

I would personally be inclined to make it so that the TITLE is the one that can drop its delimiter, though I know there are people who would like the SIGNATURE to do that, to have Luhnmann-style name.

Overall, I am open to ideas. If we can have an elegant solution, I am all for it.

-- Protesilaos Stavrou https://protesilaos.com

mankoff commented 1 month ago

Does it work if you quote the file names?

No. Both "bar" and 'bar' produce --bar.

we cannot know what the previous preference was and if any files were created using that one.

You stress the importance of never changing it enough that one option would be to leave it to the user to adjust files if they change order. Or provide a convenience function to assist in renaming. If the old and new orders are provided, I think this is trivial. If I'm missing something about the complexity here, I vote for adding support for only TITLE. Seems fairly elegant to me.

For shell scripts, this get trickier because the file name alone does not tell us what the component is and then we need some other heuristic.

I'm not even trying to do anything that complicated at the shell. Just grep fails.

protesilaos commented 1 month ago

From: Ken Mankoff @.***> Date: Wed, 17 Jul 2024 22:59:39 -0700

Does it work if you quote the file names?

No. Both "bar" and 'bar' produce --bar.

Indeed, I tested it now. For relative paths, the "./" prefix seems to work:

$ grep -E "test" ./--testing-the-merge@@20240626T212048__denote_hello_one_testing.org

we cannot know what the previous preference was and if any files were created using that one.

You stress the importance of never changing it enough that one option would be to leave it to the user to adjust files if they change order. Or provide a convenience function to assist in renaming. If the old and new orders are provided, I think this is trivial. If I'm missing something about the complexity here, I vote for adding support for only TITLE. Seems fairly elegant to me.

One thing I forgot to mention is how all this affects the Denote functions that read the file name to find the relevant component. Whatever decision we make here will need to be reflected there, so hopefully we do not make it too complex.

For shell scripts, this get trickier because the file name alone does not tell us what the component is and then we need some other heuristic.

I'm not even trying to do anything that complicated at the shell. Just grep fails.

Hopefull the "./" is an option for you. Otherwise, you have to rely on absolute file system paths.

-- Protesilaos Stavrou https://protesilaos.com

mankoff commented 1 month ago

Does it work if you quote the file names?

I (obviously) misunderstood the question. I tried to quote the title when creating the note, not the filename when accessing.

Yes, most (all?) bash commands have options to work with leading -. Either a \ or -- to denote end-of-args, quotes, leading path elements, etc.

One thing I forgot to mention is how all this affects the Denote functions that read the file name to find the relevant component. Whatever decision we make here will need to be reflected there, so hopefully we do not make it too complex.

Yes.

I mostly work in Emacs where this is only an aesthetic issue. It just seems... inelegant to see the leading -- a dired listing and elsewhere in emacs, and adds complication elsewhere outside of emacs. But it does work as-is.

It's very good software. Thank you.

MirkoHernandez commented 1 month ago

I'm not familiar with all the technicalities of the naming convention but here is a suggestion that could help solve this (or related) issues. The approach is to create the more complex regular expressions using rx from basic patterns like denote-id-regexp.

(setq file "20240802T184947--example-file-name__keyword.org")
(setq file2 "--example-file-name__keyword@@20240802T184947.org")
(setq file3 "example-file-name__keyword@@20240802T184947.org")

;; denote-title-text-regexp recreated in rx. 
(setq test-denote-title-regexp
      (rx  (seq (literal "--")
        (group (regexp "[^.]*?"))
        (or (regexp "==.*")
            (regexp "__.*")
            (seq (literal "@@")
             (regexp denote-id-regexp))))))

;; version that captures the title
(setq test-denote-title-regexp2
      (rx  (or (seq (zero-or-one (literal "--"))
            (group (regexp "[^.]*?"))
            (zero-or-one (regexp "==.*"))
            (zero-or-one (regexp "__.*"))
            (seq (literal "@@")
             (regexp denote-id-regexp)))
           (seq (literal "--")
            (group-n 1 (regexp "[^.]*?"))
            (or (regexp "==.*")
            (regexp "__.*")
            (seq (literal "@@")
                 (regexp denote-id-regexp)))))))

(and (string-match test-denote-title-regexp2 file3)
 (match-string-no-properties 1 file3))

(and (string-match test-denote-title-regexp2 file2)
 (match-string-no-properties 1 file2))

(and (string-match test-denote-title-regexp2 file)
 (match-string-no-properties 1 file))

;; same performance between denote-title-regexp and rx version
(benchmark 10000
       (and
        (string-match
         test-denote-title-regexp 
         file)
        (match-string-no-properties 1 file)))

(benchmark 10000
       (and
        (string-match
         denote-title-regexp 
         file)
        (match-string-no-properties 1 file)))

;; No noticeable performance difference for the regex that captures the leading title 
(benchmark 10000
       (and
        (string-match
         test-denote-title-regexp2
         file3)
        (match-string-no-properties 1 file3)))
protesilaos commented 1 month ago

From: Mirko Hernández @.***> Date: Fri, 2 Aug 2024 16:21:40 -0700

I'm not familiar with all the technicalities of the naming convention but here is a suggestion that could help solve this (or related) issues. The approach is to create the more complex regular expressions using rx from basic patterns like denote-id-regexp.

[... 64 lines elided]

I have not tried this yet. Can you tell me what difference does it make? I am asking because I cannot tell just by looking at the code and I am not familiar with 'rx' (it is a big macro with its own language).

From what I understand, 'rx' is a way to write regular expressions in a more Lispy way than some long string. But the end result should always be the same, right?

-- Protesilaos Stavrou https://protesilaos.com

MirkoHernandez commented 1 month ago

From what I understand, 'rx' is a way to write regular expressions in a more Lispy way than some long string. But the end result should always be the same, right?

Yes, exactly.

I have not tried this yet. Can you tell me what difference does it make?

It allows the composition of regular expressions from basic patterns. Since the new denote file name convention allows many combinations of valid file names I though It would be useful to specify these using rx.

A secondary benefit is that many different regular expressions can be bench-marked programmatically.

protesilaos commented 4 weeks ago

From: Mirko Hernández @.***> Date: Mon, 5 Aug 2024 10:15:56 -0700

From what I understand, 'rx' is a way to write regular expressions in a more Lispy way than some long string. But the end result should always be the same, right?

Yes, exactly.

Good to know!

I have not tried this yet. Can you tell me what difference does it make?

It allows the composition of regular expressions from basic patterns. Since the new denote file name convention allows many combinations of valid file names I though It would be useful to specify these using rx.

Indeed. Then we can also write more tests for it.

A secondary benefit is that many different regular expressions can be bench-marked programmatically.

This is a nice extra.

Now the blocker is that I must learn 'rx'...


On the point of this issue though, 'rx' will not change the status quo, meaning that users will still need to escape a leading "-" in file names.

-- Protesilaos Stavrou https://protesilaos.com

MirkoHernandez commented 4 weeks ago

On the point of this issue though, 'rx' will not change the status quo, meaning that users will still need to escape a leading "-" in file names.

A clarification on the rx example. It allows to easily specify "conditions" in regular expressions. The following example matches 3 possible positions for the title (leading title, leading '--' then the title, '--' and title after some other construct). This would have to be repeated for the other components, although basic patterns could be written as variables ("==.", "__.").

I'm not understanding why the regular expression approach is not enough. Lets say there is a leading signature, then the title will have a leading '--', if there is a leading title the signature will have a leading '=='. It seems to me that a complex regex can match all these examples.

(setq test-denote-title-regexp2
      (rx  (or (seq (zero-or-one (literal "--"))
            (group (regexp "[^.]*?"))
            (zero-or-one (regexp "==.*"))
            (zero-or-one (regexp "__.*"))
            (seq (literal "@@")
             (regexp denote-id-regexp)))
           (seq (literal "--")
            (group-n 1 (regexp "[^.]*?"))
            (or (regexp "==.*")
            (regexp "__.*")
            (seq (literal "@@")
                 (regexp denote-id-regexp)))))))