makeglossaries with abbreviations/acronyms

atticus-sullivan commented 1 year ago

--makeglossaries doesn't work with other extensions than .glo/.glg/.gls. Therefore, it can't be used for acronyms or other glossaries.

I'd work on this, but not right now. Just wanted to file this "bug/feature".

I'd have to read the glossaries documentation more carefully if there are some common extensions (I think so). Then maybe we should activate these by default and allow the user to add more extensions to it.

atticus-sullivan commented 1 year ago

Seems that there are some common ones:	Type	log	output
acronyms	`.alg`	`.acr`	`.acn`
symbols	`.slg`	`.sls`	`.slo`
numbers	`.nlg`	`.nls`	`.nlo`
index	`.ilg`	`.ind`	`.idx`

With the following file types:

log: not relevant, only for logging
output: maybe relevant, output of xindy/makeindex
input: relevant, input of xindy/makeindex

At least that's what I found in the documentation of the glossaries (and glossaries-extra, but didn't found additional ones there) package.

So maybe it would be worth it to add these extensions by default as well. In addition, we should somehow enable the user to add own extensions/-pairs.

I've seen that for building the glossary, there exist three/four possibilities. Currently using options 1-3 (tex-driven, makeindex and xindy) can be used easily with cluttex. Options 2 and 3 really easy with makeindex(-lite).

Some additional thoughts:

Option 4 (bib2gls) isn't supported right now (as far as I read it, intermediate data is stored to .aux, which makes it pretty hard for us to detect whether to run bib2gls or not).
With the introduction of additional file extensions we watch, using makeindex or xindy (or any other command which has to be run per file extension) directly as --makeglossaries command becomes hard. We'd need to either use some pattern (provided by the user when configuring) where we can input the corresponding filenames, specify one command per file extension (or loose the option to configure the command that flexible)

Maybe it would be better not to add all extensions by default so that the current behavior sticks if nothing is changed. If the user specifies additional extensions, then multiple commands have to be specified (question is still how).

So I'm still unsure how to do this the nice way.

atticus-sullivan commented 1 year ago

Currently I'm planning to implement something like this:

The interface will be passing formatted strings like this type[makeindex/xindy]:log:output:input:path-to-command:command args[optional] via a new commandline argument (keep being backwards compatible,--makeglossaries would be an incompatible (?) orthogonal option as I plan on using makeindex and/or xindy directly). This argument will push tables like

-- store struct
local s = {
  type = "",
  log = "",
  out = "", -- infer from log (replace last char by 's')
  inp = "", -- infer from log (replace last char by 'o')
  path = "", -- try PATH
  args = "", -- infer from log, out and inp
  cmd  = "" -- built at last
}

to a list of glossary configurations (support for multiple glossaries). We then just watch for changes in the respective inp files, if so we run the generated cmd to build the respective out file.

Quick demo code of the parsing

```lua local function split(str) local ret = {} local i = 1 str = str..":" while true do local char = "\\" local s,e = i,i while char == "\\" do _,e = str:find("[^:]-:", i) if not e then break end char = str:sub(e-1, e-1) i = e+1 end if not e then break end table.insert(ret, str:sub(s,e-1):gsub("\\:", ":").."") end return ret end print(table.concat(split("alg:acr:acn"), ", ")) print(table.concat(split("slg:sls:slo"), ", ")) print(table.concat(split("nlg:nls:nlo"), ", ")) print(table.concat(split("ilg:ind:idx"), ", ")) print("\nUSAGE: 'type[makeindex/xindy]:log:output:input:path-to-command:command args[optional]'\n") -- -- store struct -- local s = { -- type = "", -- log = "", -- out = "", -- infer from log (replace last char by 's') -- inp = "", -- infer from log (replace last char by 'o') -- path = "", -- try PATH -- args = "", -- infer from log, out and inp -- cmd = "" -- built at last -- } local function loadCfg(str) local s = split(str) assert(#s >= 2 and #s <= 6, "wrong input") local ret = {} ret.type = s[1] assert(ret.type == "makeindex" or ret.type == "xindy" or (#s == 6 and s[5] ~= "")) ret.log = s[2] if #s >= 3 and s[3] ~= "" then ret.out = s[3] else ret.out = ret.log:sub(1,-2).."s" end if #s >= 4 and s[4] ~= "" then ret.inp = s[4] else ret.inp = ret.log:sub(1,-2).."o" end if #s >= 5 and s[5] ~= "" then ret.path = s[5] else ret.path = ret.type end if #s >= 6 then ret.args = s[6] else if ret.type == "makeindex" then ret.args = ("-t '%s' -o '%s' '%s'"):format(ret.log, ret.out, ret.inp) -- TODO shell escaping elseif ret.type == "xindy" then ret.args = ("-t '%s' -o '%s' '%s'"):format(ret.log, ret.out, ret.inp) -- TODO shell escaping else error("invalid state") end end -- build command ret.cmd = ret.path.." "..ret.args return ret end local function printCfg(c) print(("type: %s, log: %s, in: %s, out: %s, path: %s, args: %s, cmd: \"%s\""):format(c.type, c.log, c.inp, c.out, c.path, c.args, c.cmd)) end print("makeindex:main.ilg:main.ind:main.idx =>") local c = loadCfg("makeindex:main.ilg:main.ind:main.idx") printCfg(c) ```

As I'm not sure what we currently parsing from the aux file and how to extract the bib2gls information, I refrain from implementing bib2gls specific stuff for now.

The current plan is to give all possibilities to the user (adjust the basename of the glossaries files as the filename has to be given, not just the extension, add arbitrary options to the called executable and use executable not contained in PATH).

Even if the type is unknown, it suffices to provide commandline arguments and the path to the executable and the filenames (so that we can watch for changes) to get it working (at least that's what is planned).

If there are any thoughts on the plan I outlined so far just let me know (here in this issue).

minoki commented 1 year ago

I'm looking at .aux file, and there's lines like \@newglossary{main}{glg}{gls}{glo}. I wonder if they could be parsed to detect input/output files. What do you think?

atticus-sullivan commented 1 year ago

Yes you're right, in my case it is \@newglossary{acronym}{alg}{acr}{acn}. Just checked for how currently (in #15) the option is passed (--glossaries="makeindex:main.gls:main.glo:main.glg") I see that one is more configurable in terms of wether to use makeindex or xindy. I'm not completely sure whether this maybe a bit more complex but also more flexible configuration is worth it. I think people who figure out how to set up additional manual glossaries also can find this option (but there might be some that just copy-pasted the glossary setup from tex.stackexchange). Maybe we can go on, check the .aux file for the common ones like glo/acn and display a hint on how to set these things up.

But when I think about it, that's almost like using the information from the .aux file (in terms of complexity, so we might just use the information from the .aux and if the user specified something else for that combination of input-/output-/log-files we can use that).

Any thought at which part of the code such parsing can take place? (iirc we're already parsing the .aux file at some point).

minoki / cluttex

makeglossaries with abbreviations/acronyms #14