sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.65k stars 98 forks source link

De-conflate script & require #1385

Closed alerque closed 2 years ago

alerque commented 2 years ago

On the Lua side of this we are fine, but on the SILE \script side we have conflated two different operations: require and dofile are fundamentally different and we should have both. So far we keep using \script[src=package/foo] as if running a script was the same as loading a library, but it is not. Loading a library happens once and then even running it again will have (or should have) no effect. On the other hand a document should be able to execute an external script more than once and have it run all over again.

I think we need to unscramble this and make the \script[src=...] always re-execute the external code (previously loaded or not) and add a new \require or \package or something to that effect that runs SILE.require().

alerque commented 2 years ago

c.f. #1373

Omikhleia commented 2 years ago

On the same vein of ideas, we might perhaps want some way to know whether a package is loaded (rather, e.g. than testing if one of its command exists)?

alerque commented 2 years ago

Technically that is already possible using Lua idoms: the package.loaded table has a list of everything that it has loaded and that include SILE packages.

./sile -e 'SU.dump(pl.tablex.keys(package.loaded));os.exit()'

For example you could test to see if a package is loaded:

-- truthy in anything derived from the plain class:
package.loaded["packages/bidi"]

-- falsey out of the gate
package.loaded["packages/rebox"]

-- truthy after required
SILE.require("packages/rebox")
package.loaded["packages/rebox"]

This has several pitfalls though:

  1. Partly because of the mixups described in #1373 we've been conflating Lua module names (which use . as a generic separator that gets converted to platform specific path separators on use) with file system relative paths. These really should be the former, but if you look at the dump above you'll see lots of the latter. If you use this it will break when that conflation gets cleaned up.

  2. It has everything and there is no real differentiation between a Lua library that is just available in the namespace and a SILE package that got loaded and made modifications tho SILE or registered commands in the class or whatever.

  3. A module being loaded (and even loaded into the class with class:loadpackage()) does not equate to it being active. As an example bidi may be turned off or on at any given time, but once it is loaded it will always show up in the Lua memory of loaded packages. Some packages should probably provide their own API for querying their enabled/disabled state.

All that to say "yes", I agree we should probably have a standardized API for querying "is X package loaded in the current document class", and possibly also a standardized API for querying a loaded package to ask whether it is active/inactive/etc.

alerque commented 2 years ago

I think dealing with \script{} is kind of hopeless. Anything we do with it other than leave its identity crisis in place will be huge breakage for little gain.

Instead starting in #1373 I started adding much more explicit commands and options that do much more predictable things. For example \lua[src=path] assumes you want to run Lua code, not just load it, while \lua[require=module] assumes you want to load it without necessarily running it (Lua itself isn't supper clear on this topic, but at least this mode just passes that mess off on Lua and accepts its verdict). If it happens upon a valid SILE module running it might be the same as initializing it, but that's an implementation detail of processing Lua code not the SILE command.

An upcoming \load-package{<module>} proposal should quietly replace most usage of \script[src=packages/<module>].

Currently no plan to deprecate \script, but

Omikhleia commented 2 years ago

\load-package{<module>} sounds "long" and contrived. A simple \package (or \usepackage as in LaTeX) would do it. I always liked the src=file part. Contents between { ... } are a bit messy compared to options, and this is in line with plenty of similar cases (\image[src=...], \svg, \include...). To me, it feels more natural.

alerque commented 2 years ago

The src= is problematic because it isn't a source path. The \svg[src=] is a path, the \script[src=] is not always, in fact most usages are not a path. Even more so for the more explicit commands I've added that don't try to do both.

I wouldn't mind \usepackage, but \load-package matches the API inferface class:loadPackage() better and nicely matches a potential \unload-package (something LaTeX does not have a corollary for but we are pretty close to being able to implement). \package would work for me too, but what would be the reverse action?

Omikhleia commented 2 years ago

the \script[src=] is not always (a path), in fact most usages are not a path. Either I misunderstand or there's something I have not used (yet).

\script[src=packages/yolo]

That's a path to packages/yolo.lua in most cases, no (ok, one could argue about the extension part, but otherwise it's still kind of a path). Or are there cases with both src and a content argument, or some other usage I'm overlooking?

As for package unloading, the problem I have with it is that it is not something I expect to use frequently. I mean, the only use case might be documentation (where we may want to load a package, demonstrate, unload it), but for 99% of real usages (producing content), this will not occur. I also fear that the pattern could be abused e.g. I unload a package that was a dependency of some other package, it will likely have hard-to-debug side effects.

In other terms, I am not sure package unloading should even be exposed as a SIL command. It's something that could stay internal (e.g. autodoc could use it), but not exposed in the wild... Also, outside documentation, it doesn't seem to me it addresses any obvious end-user need so far.

alerque commented 2 years ago

That's a path to packages/yolo.lua in most cases

No it isn't. It's a module name that will get processed though a long string of fallbacks including Lua's package.paths and SILE's. That difference is important and will become more important as 3rd party packages start getting easier to distribute and hence more involved.

Omikhleia commented 2 years ago

Then...

\use[module=packages/yolo]
...
\unuse[module=packages/yolo]
Omikhleia commented 2 years ago

(Or even \use[package=packages/yolo] -- if I am allowed to have packages not necessarily in a packages subfolder...

EDIT: or even \require ... \unrequire (I didn't know the latter to exist, but according to Wiktionary, it does at least since 1922. "I am unrequiring the use of packages", lol.)

alerque commented 2 years ago

It this point I'm headed toward forcing things like "packages/" in the path for the more automatic functions. I expect 3rd party packages to be able to add more than just strictly packages: they can add classes or inputters or shapers or typesetters or whatever. Not having some structure to those things would make the internal interfaces very complex and brittle. Assuming if we want to load a shaper that it will be found in a shaper.x module name makes it much simpler and predictable (and testable). If you wanted to circumvent this you could easily do so by loading the file yourself manually (but anything SILE.required() would detect what it is).

Having this distinction will also make it easier to do things like test if a package is loaded, if we are in a certain class, which shaper is active, etc.

Why does \use[module=package.foo] win out over \load-package{foo} in your mind? You complained my naming was along, but my proposal is shorter than yours and I think more explicit at the same time. Not being rude here, just trying to think through what the pros/cons of interface naming is here.

alerque commented 2 years ago

P.S. I thought about \require but that quickly starts getting messy with different Lua versions, reserved words, etc. Also it makes it easy to conflate fundamentally different tasks which is exactly what I'm trying to get away from.

Omikhleia commented 2 years ago

Sometimes we have things with colons, e.g. converters:register. Sometimes with dashes, e.g. show-counters Sometimes with both font:add-fallback (and it comes from the font-fallback page, so it's not even the "namespace", or that should have been font-fallback:add maybe...). Sometimes with none of those, e.g. loadbibliography, showframe.... Oh well... ^^ Matters of taste. I am not fond of load-package' and strings in the command argument (which is a tree, so I bet it's takingcontent[1], but that thing is already so weird, I'd be afraid of generalizing it that much... While options do not have this problem, have cool existing things for them (such asSU.required(...)` etc.) that all feel "consistent").

I don't want to be rude either, but I don't get what's so important for end-users to invoke \script[src=packages/lists] or \use[package=lists] (hey I made it even shorter)...

alerque commented 2 years ago

I don't get what's so important for end-users [...]

The importance here is that the user needs some control over whether what they are loading is loaded using the SILE package path and a module loader or directly from a file system path, and also whether what they are running is a loader (as in SILE.require() that should be loaded once and then left alone no matter how many times it might be asked for, and a script that might be included multiple times and need to be executed each time. Is is impossible to overload \script[src=...] to accommodate guessing all 4 scenarios. It inevitably is wrong some of the time and you can't have it both ways (e.g. you can't prefer document local paths & always executing and have the package loader be idempotent and find 3rd party package installations in a sensible order).

I've had trouble with my own projects because it was guessing wrong and had to override SILE internals. I shouldn't have had to do this, hence why I am looking for more explicit alternatives to cover at least some of the 4 scenarios with more targeted tooling.

alerque commented 2 years ago

I'm pretty convinced of the need for more specific commands that one multi-purpose \script. Whatever we name the loader, I am not 100% set on using the content to pass the name, but I think it is better because then we can pass through 100% of the options to packages/classes/whatever that can handle being passed options.

Since I expect to be able to load more than strictly packages (e.g. shapers or typesetters or outputters, perhaps \use is better than \load-package.

While not currently implemented by any packages, using the content field to pass a module name keeps our options open to pass options like this:

\use[bleed=6mm,offset=2mm]{cropmarks}
Omikhleia commented 2 years ago

While not currently implemented by any packages

You said it all here. ^^

alerque commented 2 years ago

While not currently implemented by any packages

You said it all here. ^^

If I had, I was wrong.

I was initially going to say I didn't understand how packages could be expected to take advantage of a feature we hadn't given them access too yet, but it turns out even that would have been wrong. We did expose this and some packages did implement it—one example is the autodoc package which accepts theme values only on initialization. The caveat was we didn't expose any way to pass these values from SIL markup. It was reachable via Lua if you loaded the package yourself, but not if you loaded it from declarative document markup.


The remaining naming choice I think we have to make is whether to have separate loaders for different module types or one common one.

% Option 1
\use{packages.foo}
\use{shapers.bar}

% Option 2
\use-package{foo}
\use-shaper{bar}

Of course in both these "use" could be "load" or something else as well, but the principle of having one generic command vs. multiple special purpose ones is the question here in my mind.

alerque commented 2 years ago

To answer my own question, after playing around with both "option 1" has a significant advantage of not locking users into the file layout we picked and allowing them to load a package from something other than a pakcages/?.lua;packages/?/init.lua path. I think that outweighs most other factors I had in mind one way or another. Internally we'll stick to the expected directory structures, but there isn't a need to limit 3rd party layouts when we can detect the module type on load anyway.

Omikhleia commented 2 years ago

One more thinking between \whatever{content} vs. \whatever[param=value].

There are already cases where we conflate the two of them. E.g. while giving a thought to the indexer, I frowned at \index{key} -- Which in this case doesn't even use content[1] but even extracts the nnodes to build string -- in my above example, the key would end up being "foo lol doh" -- I do feel there's something wrong there. EDIT. Another example of content used as parameter is \xmltricks{tag1 tag2 ...} ^^

Both are workable, but one feels more natural to me (and that's not the one using the content).

alerque commented 2 years ago

Very good point. Thanks for taking the time to try to convince me.

We can't immediately fix the existing instances without more breaking changes, but heading towards a future where content is something that has a chance to end up in the output stream makes some sense to me. At the very least we should probably be using contentToString() on any content we do consume like that.

I'll have another go at this with that in mind. It will make it a little more awkward to separate and pass arguments to modules we load up, but maybe that's a reasonable price to pay. I'm still sorting through options for how to read Lua input too. So far we have the ability for a Lua script to:

Right now the \script command tries to guess between all of them. It will get some of them wrong some of the time, especially in the matter of path handling. That's part of what we are de-conflating here.

The new \use will only really be targeted at the first of those cases.

Without immediately breaking every project out there I think we need to steer people away from the last of those for most use cases. There are valid use cases for it and for the foreseeable future it will still be an option, but the more people try to do with it that could be done with one of the other mechanisms the harder is going to be to get them upgraded when SILE is fully modular and well enough compartmentalized to start swapping in modules in other languages (C, Rust, etc.). I want it like it is now so that any component may be replaced dynamically from Lua, but with each component being self contained with known inputs and outputs. Once we stop relying on so many globals and side effects ourselves we can reasonably start swapping components.

Avoiding globals and side effects will also make it much easier to tease out bugs. It is VERY difficult right now to even reason about or inspect what the typesetter is doing to figure out our long standing push back bugs because there are so many side effects going on it defies inspection.

alerque commented 2 years ago

Current thought:

Some commands might optionally take either (but not both) of these, but the path handling expectation should be stricter than it is now (i.e. no automatic fallback to the other if one fails).

Omikhleia commented 2 years ago

At the very least we should probably be using contentToString() on any content we do consume like that.

That's a part of the problem, which I tried to discuss earlier (on the issue related to TOC and PDF bookmarks), but things sometimes need time to mature (and it also took me time, either, to understand these):

The contentToString approach is very likely wrong, because it acts at AST level, and has no idea (yet) what will be the output, without recursing (but it really can't !)

Take, e.g. a Mardown title:

# Why _italics_ are important?

This would, likely end up as some equivalent of \chapter{Why \em{italics} are important}. So far so good.

Now, one expects to be able to have a link to that section. What Pandoc and others offer is an implicit hash such as "why-italics-are-important". Say we want this, i.e. \chapter[marker=why-italics-are-important]{Why \em{italics} are important}. Yes, when cross-references eventually make it, we might need to leverage our book class this way (... is the picture clearer, here? i.e. regarding book class expectations...) Anyhow, how can we build this?

The contentToString approach would currently yield Why are important = Fail. The "nnode to text" approach as fixed in ToC/PDF bookmarks would yield Why italics are important = Close to it. Just replace the spaces with dashes, are here you are. (EDIT: and lower-case, I think)

alerque commented 2 years ago

The current iteration in #1482 has \use[module=packages.foo]. It is also possible to pass content (is passthrough mode) that is assumed to be a Lua module itself making it possible to source any sort of module inline in a single stand alone file if you really wanted to. I also rewrote the CLI args introduced in #1373 from --require to --use to match. The idea there is to distance this action from Lua's require() that just loads code. For out modules we need to load them but also initialize them connecting up namespaces, etc.

Also I introduced SILE.use() as a Lua API matching the declarative \use command. I did not yet deprecate SILE.require() but I don't see a future for that—beyond figuring out how best to shim it to help people migrate. It's also hard to deprecate it while it is already overloaded with some funky path guessing that is being deprecated, so it might take a few cycles to phase out.