Inputter options - Githubissues

Omikhleia commented 2 years ago

The \include command is implemented as follows:

  self:registerCommand("include", function (options, content)
    if options.src then
      SILE.processFile(options.src, options.format)
    else
      SILE.processString(content[1], options.format)
    end
  end, "Includes a content file for processing.")

(I am quoting, on current master at this date, that is after the nice changes added by @alerque, but the remark below also applies to earlier versions of it AFAIK).

Would it be possible for it to have extra options, passed to the underlying inputter's process method (which only takes the content data doc currently), which could them use them to affect its parsing logic?

I am foreseeing at least two use cases for this.

For XML, ability to enforce a namespace/prefix on tags. I have some documents which have tags corresponding to SILE commands, but used for other purposes. It was workable by saving the original command to something else, but a nice option could be to enforce some sort of namespacing. For instance \include[src=myfile.xml, prefix=myscheme] and all tags would be prefixed (e.g.) with myscheme:, so that I can separate those XML tags from SILE commands, avoiding clashes (and implementing them all under that appropriate naming scheme). Say, e.g. <comment>xxx</comment> would become \myscheme:comment{xxx} and I don't have to temporarily switch/restore the usual \comment.
For Markdown, there are many flavors of it (read: extensions), without obvious ways to know in advance, and this could be a way to select some of them, e.g. \include[src=myfile.md, smart=false, startnum=false]

This is just a low-priority "convenience" remark in passing, none of this is absolutely necessary: in the Markdown cases, these are mostly Pandoc-like extensions that seldom affect the writer's intentions, AFAIK; in the XML case, it is rather easy, as noted, to workaround it (though a bit of a challenge sometimes), and in the worst case, preprocessing the input with an XSLT stylesheet (or other solutions) is quite doable too.

Omikhleia commented 2 years ago

Oh and another slightly related use case for Markdown: I haven't thought to it a lot, but I am not sure how to possibly control the use of the metadata (e.g. YAML block). In some case, I could want to consider the document to be kind of "standalone" (i.e. propagating some metadata such as the author, etc.), in some other case, when including chapters in some more general document, just have them skipped. There could be other solutions here, however, that do not need changing \include.

alerque commented 2 years ago

Yes, we can figure out something. I bumped a similar need already too. One interesting note is that this isn't just a need for \include but also something we'll need to be able to pass from the CLI. Markdown especially is going to need help because it isn't always possible to detect the different possible flavors of markdown. Even with my initial idea of having multiple Markdown inputters using different tech behind the scenes (e.g. a markdown vs. a commonmark) we will still need to pass options such as in your include example. We can load an inputter with -r inputters.commonmark and set class options with -O papersize=a6, but we don't have a way from the CLI to set inputter options.

I suppose one way for the CLI to handle it would be have a settings or method that a chunk of Lua code could access from an -e <code> evaluation, but that doesn't feel right.

Back to \include, since we don't know all the options any given inputter may need might it be fair just to pass an options table rather than adding more arguments? Any arguments we don't use for all processors (src, format) could be stuffed in a table and passed through as a third argument.

Also do these args need to reach the :parse() methods of the inputter or just the :process() methods?

Omikhleia commented 2 years ago

Also do these args need to reach the :parse() methods of the inputter or just the :process() methods?

Both, it seems: only process() is what previous SILE.readFile() invoked - and your new functions too, but then it calls the parser which does it stuff. In my current (old-way) code for markdown, the parse function was not even public yet (it was a local in process(), but this is where the options would mostly be used.

It does seem to me, too, that an option table would be the way to go, as we cannot tell which options the inputter/parser may support. (I am not even sure we'd have to filter out the options.src and options.format, they wouldn't cause much harm)

alerque commented 2 years ago

Both, it seems: only process() as it is what previous SILE.readFile() invoked - and your new functions to, but then it calls the parser which does it stuff. In my current (old-way) code for markdown, the parse function was not even public yet (it was a local in process(), but this is where the options would mostly be used.

If you look at the PR I sent to your fork, there are two parse functions in the inputter: one private with the callouts to the markdown AST writer, and one public that goes with the SILE inputter module (that calls the private one with the right data). The :parse() method needs to be public because there are some places that use the AST without :process()ing it. This includes tests, the content detection type snooping, and some fancy hacks I saw in other people's packages. Calling :process() usually just calls :parse() and then SILE.process() on the output, but it isn't always quite that simple.

The SIL inputter also has a similar private parse function used in the public :parse() method.

alerque commented 2 years ago

I started looking at it as this is closely related to refactoring in #1482. I see several ways to get the job done but I'm a little puzzled about what the best ergonomics would be. We need something that works both from the CLI and programmatically when loading SILE as a library.

Right now we have --use to load up and initialize a module (class, package, inputter, whatever). Also we have --options which passes key=value pairs as a table to the document class.

We also had an API for passing arguments for packages (used by autodoc, masters, twoside, and maybe others) but no way to pass these from either declarative markup or the CLI. It's now really obvious where to add them for declarative markup, but the CLI is less obvious.

We need a way to pass options to (at the least) inputters, classes, and packages that are specified via --use from the CLI. I can foresee possible needs for other module types too (e.g. outputters), so something generic would be nice.

Also note inputters are not necessarily limited to one-per-document, and packages definitely are not. Classes are pretty much locked to one-per-document.

Without thinking too hard about the implementation, form an end user ergonomics standpoint how should the CLI pass options to inputters and packages that are not part of the document declarative markup?

alerque commented 2 years ago

It's been so long since I started using it, I forgot the --options argument is new since the v0.13.x series and not yet released. That gives us the flexibility to change it still without officially breaking anything (my projects notwithstanding).

The vast majority of usage I see for this is for document class options (e.g. sile -o papersize=a6 foo.sil to override the document paper size), but if we need to pass options to other module types maybe we should come up with something more generic before that releases.

alerque commented 2 years ago

One more thing, the evaluate option already gives one route, but that doesn't seem very ergonomic:

$ sile -e 'SILE.use("inputer.foo", { bar = "baz" })' doc.sil

Not terrible, but the whole song and dance to escape Lua code from the CLI does not feel like a nice UX to present to most end users for what may be a common usage if inputters start proliferating.

alerque commented 2 years ago

I'm still a little stuck on how to keep the CLI on parity here—the Lua API is pretty easily (SILE.use("module", { foo = "bar" })) and the declarative markup is workable enough (\use[module=module,foo=bar]) but what should a CLI invocation look like?

I'm considering parsing it with LPEG and allowing an options input format similar to our SIL options:

$ sile -u module[foo=bar] document.xml

Alternatively I guess I could just make sure we have a way to evaluate SIL in the same way we already have for Lua with -e:

$ sile -s '\use[module=module,foo=bar]' document.xml

That gets kind of messy though because it is not clear how to handle the master document vs. preamable material when the preamble might have inputter options. I don't think I want to go back down that rabbit hole.

alerque commented 2 years ago

As of the current PR I'm going with the former. Examples from this issue might look like this if done from the CLI:

$ sile -u inputter.markdown[smart=false,startnum=false] foo.md
$ sile -u inputter.xml[prefix=myschema] foo.xml

sile-typesetter / sile

Inputter options #1486