typelevel / Laika

Site and E-book Generator and Customizable Text Markup Transformer for sbt, Scala and Scala.js
https://typelevel.org/Laika/
Apache License 2.0
410 stars 44 forks source link

Add support for HTML site search #493

Closed armanbilge closed 1 year ago

armanbilge commented 1 year ago

The topic of search came up briefly in https://github.com/typelevel/Laika/issues/360#issuecomment-1399563317 and I'd like to spin it into a proper feature request.

@valencik and @samspills have been working hard on protosearch and friends which is a pure Scala search library. There are some demos, such as this http4s docs search embedded in an HTML page using a JS-exported API.

I've only been following along in awe, I'm not deeply versed in how it's all working unfortunately 😅

The high-level idea of protosearch is to:

  1. support indexing on the JVM
  2. support querying in the browser via JS (or one day WASM)

For Laika integration + deployment, one idea we've had is the following:

  1. Indexing is implemented as a Laika renderer that produces an index artifact. This can be deployed as part of the site (similar to how e.g. epub / pdfs are included in the site).

  2. protosearch publishes a JS library to NPM and is available via CDNs. This JS library can load an index file and run queries on it.

  3. A search page is added to the site which uses a small bit of JS to glue together a search bar, the protosearch.js library, and index file.

This doesn't necessarily have to be something in the Laika core repository and there may be good reasons it shouldn't be. But it could exist as a separate project/plugin that is brought in by sbt-typelevel-site.

Discuss! 😇

jenshalm commented 1 year ago

Of course having a search module for Laika would be exciting news, I guess we might all agree that this functionality is Laika's most significant feature gap (by far actually in my view, it's pretty complete otherwise). So I guess we don't need to discuss whether it's desirable to have such a module and focus more on the specifics which are mostly just a) who is doing the work, b) what are the best design approaches for it and c) where is it going to live.

Regarding a) I'll be unable to do the actual coding in the near future, but I am happy to join design discussions, answer questions and do PR reviews. And regarding b) we should move the discussion to whatever we decide for c).

Therefore, I'll focus more on the question where that module should live. I think it has, in many ways, a similar characteristic as sbt-typelevel in that it is a glue module between two projects, in this case laika and protosearch. Such a module can generally live in three places: either inside one of the existing projects or completely separate. The latter has more administrative overhead, but also a few advantages: it is decoupled from the release cycles of both underlying projects and can publish new releases whenever either side does.

From those three options I really feel that for this particular scenario one should be avoided, at least in the beginning, which is making it a part of this repo right from the start. The main reason is that the two projects would be in a completely different phase of their lifecycle: while Laika is getting close to doing a 1.0 release and sealing its APIs and promise long-term binary compatibility, the search module will initially be in a much more exploratory phase and most likely benefit from the option to make binary-breaking changes whenever necessary. As a general rule of thumb I would prefer to avoid any runtime dependency on a library in 0.x range (http4s is one unfortunate exception here).

This is also pretty much aligned with the general design idea behind Laika, where the expansive extension APIs were always meant to support 3rd party modules. The only reason that has not happened so far is that until fairly recently, Laika's core had frequent and sometimes quite disruptive changes itself. For example, there was no reason to develop the Helium theme separately and promote it later, so I simply included it in the 0.16 release. But with Laika reaching 1.x this really changes and by now I would probably develop a completely new module that needs to mature over time in a separate repo myself.

That leaves us with two options, that I am equally fine with: Either the search module lives somewhere else permanently, or it just starts its life there and gets promoted to the Laika main project much later. In the latter case my criteria for promotion would be roughly this (this applies on a general level, not just for the search functionality):

a) the module has moved beyond its 0.x lifetime and aims at binary compatibility (in this case both, the Helium theme extension and the underlying search library). b) the module had several iterations of improvements based on user feedback and actual adoption. c) the authors and users of the module would at some point prefer to see it integrated into the main Laika releases. d) the authors are open to potential API adjustments in case they have unnecessary differences to Laika's own naming conventions and API patterns.

Regarding documentation we could add pointers to the search module docs in both, Laika and sbt-typelevel.

Btw. if you want to prepare for a promotion scenario straight away, you can use the laika.search package prefix, I can guarantee that I won't use it in the meantime.

If you don't mind I would convert this issue into a discussion, as issues are meant to lead to actual work in this repo.