jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.45k stars 3.37k forks source link

chunkedhtml – option to create chunks without leading numbers in file names #8581

Closed PlainMartin closed 1 year ago

PlainMartin commented 1 year ago

Describe your proposed improvement and the problem it solves.

It would be great if chunkedhtml could (optionally) create HTML files / chunks without leading numbers.

I realize these numbers are useful in many situations / use cases. But they lead to changing file names when new chapters are added to a source. “6-summary.html” becomes “7-summary.html” when a new chapter is added upstream, and when the project is republished, earlier bookmarks will return an error.

Such an option would allow for “stable” file names, where # Summary will always be rendered (and bookmarkable) as “summary.html”.

(I realize this may lead to naming conflicts if a document has multiple chapters / sections with the same heading. Maybe this could be resolved using the same mechanism used for ID creation?)

Describe alternatives you’ve considered.

An alternative would be stripping away the leading numbers both from the file names using e.g. a shell script and from all internal links in the document set using e.g. Python / BS – obviously “messy” and prone to errors.

jgm commented 1 year ago

But they lead to changing file names when new chapters are added to a source. “6-summary.html” becomes “7-summary.html” when a new chapter is added upstream, and when the project is republished, earlier bookmarks will return an error.

That's a good point. Our chunk generator has a configurable template which is currently set to "%s-%i.html". (section number = %s, identifier = %i)

We could either:

  1. Change this to just "%i.html", so the section number is omitted. (I think I just included the number so that the files would sort correctly in a directory listing, but that may not be too important.)
  2. Allow the user to override the default template using a command-line argument.
jgm commented 1 year ago

Here are the possibilities for path templates, as documented in the Chunks module:

-- | A 'PathTemplate' is a FilePath in which certain codes
-- will be substituted with information from a 'Chunk'.
-- @%n@ will be replaced with the chunk number
-- (padded with leading 0s to 3 digits),
-- @%s@ with the section number of the heading,
-- @%h@ with the (stringified) heading text,
-- @%i@ with the section identifier.
-- For example, @"section-%s-%i.html"@ might be resolved to
-- @"section-1.2-introduction.html"@.
PlainMartin commented 1 year ago

Both (omitting or a command line argument) would work for me. 😊

chunkedhtml is brand new (and it’s absolutely wonderful – I have been waiting for something like this for years). I don’t know how others would use it, and which defaults make sense for the majority of users.

Personally, I’d vote for “default to non-numbered files, with a command line option for numbering” – because there are definitely document / project types where it would be nice to see numbered, sortable chapter names in a folder (novels come to mind).

jgm commented 1 year ago

Hopefully others will comment here on what they'd find most useful.

tarleb commented 1 year ago

My vote is on an additional cli parameter.

dm413 commented 1 year ago

For my uses, I don't care how you name the files, as long as the links inside the document take you to the correct file. I can see the advantages of having non-numbered filenames if you have external links into the documentation set (for example bookmarks as @PlainMartin suggests). But for self-contained document formats, such as epub, htmlhelp, etc., the links are internal and the filename is pretty much "an implementation detail". I have seen documents that use GUIDs for filenames!

If you provide a command line option, either numbered vs non-numbered or specifying the filename template, then it might be useful to have the template used available to lua code. Perhaps in PANDOC_WRITER_OPTIONS.

torstenberg commented 1 year ago

I vote for a template for the filenames. Would be great!