armanbilge commented 1 year ago

This issue is spun out of https://github.com/planet42/Laika/issues/281#issue-1220788257.

Postpone support for Scala.js in laika-io

This option has been mentioned in the original request. It is, however, beyond the scope of the 0.19 roadmap. Any future consideration would require that someone opens a ticket for this and is able to generate significant interest in the community for having this module on Scala.js.

The goals of this issue are to propose the broader idea, evaluate if it is even within Laika's scope, and see if there is community interest.

Proposal

The idea is really two separate but (IMO) related things.

cross-publish laika-io for Scala.js (targeting Node.js) and Scala Native
create a CLI application

The motivation for a CLI is so that Laika can be used as a standalone site generator à la Jekyll. @valencik and I recently discussed this in context of the typelevel.org website. I'm not entirely sure if this is possible, since I've really only used it via sbt: specifically, if there is enough configuration flexibility via HOCON/templates, without resorting to Scala code.

Assuming a CLI is a feasible and attractive feature, then cross-building laika-io to Scala.js and Native becomes very interesting.

Scala Native CLI would be a self-contained lightweight binary with instant startup (vs a JVM application)
A Scala.js CLI published to NPM would be extremely easy to install and run from virtually everywhere

If publishing a CLI application is out-of-scope for Laika, then I think cross-building laika-io for JS/Native is less valuable. The core is already cross-published for JS (and maybe Native someday) which, if desired, can be used directly with fs2-io without going through laika-io.

jenshalm commented 1 year ago

Many thanks for authoring this proposal. While I generally think this presents an interesting and exciting opportunity, I also have quite a number of doubts, and they are less about the feasibility of this effort (I really cannot judge feasibility of anything related to Scala Native), but more about a) the scope of work that I would see as mandatory to make this an attractive offering for which I'll try to provide some context below and b) the fact that a CLI application is not a frequently requested feature at the moment.

Since this comment turned out to become rather long, I've split it into three main section that address A) the target audience you intend to reach with the proposal, B) the scope of work I'd see as necessary to produce an attractive offering and C) the question of development and maintenance, specifically in the context of the other work happening in the coming months.

Target Audience

The proposal so far discusses the question whether it is feasible and what advantages CLI applications would have (I think they are pretty obvious), but an even more critical question for me is: who would actually benefit, and how many? And by that expand the question "is it feasible" to "is it worth the effort"? I personally expect the effort to be high, and I'll explain why in the next section below about scope. Regarding the target audience I see primarily two variants, which each come with their own downsides.

The first option would be to still mostly target the Scala community. I don't feel like this would be attractive though. This would mean we target a subset of an audience that is already not that large to begin with. Usage, feedback and participation have somewhat increased since the 0.18 release and the Typelevel adoption, but given that Laika has a fairly unique position as one of very few pure-JVM options, I still feel it's not as high as it could be. And within the user base there is a clear dominance of JVM users vs. Scala.js users and within the former a clear dominance of plugin usage. I'll give some Sonatype download statistics for November as an example, it's for 0.18.2 and 0.19.0 combined to ignore aspects like migration speed: it's around 11,000 for the plugin, around just 150 for laika-core_2.13 for the JVM and around just 10 (!!!) for laika-core_2.13 for Scala.js. I don't have meaningful numbers for Scala 3 as the statistics seem to be distorted by some 3rd party integration I am not aware of. The numbers are unusually high and almost identical for JVM and Scala.js which is very unlikely to reflect real usage. And the numbers for Scala 2.12 are distorted by the plugin usage. With those numbers I am currently very reluctant to even think about increasing the maintenance burden by supporting Scala.js in additional modules. Native is a different story, but even here we can see that within the existing user community there is no significant number of requests for such functionality at the moment.

This leaves us with the second option, which is to target a wider audience and not just the Scala community. Existing usage patterns and download statistics are less relevant for this discussion as we would seek to attract a completely new group of users. This feels generally more attractive to me, but has its own downsides: We would enter a large and crowded arena, with tools like Jekyll, Hugo, Docusaurus and many more. All of which have a much larger user base, larger number of maintainers and 3rd party extensions and in some cases also a larger feature set. We would need to really think about how and where we would announce and market such a solution and what the selling points would be that could potentially tempt users to switch. When I started with Laika I was very well aware of the crowded nature of this arena and deliberately chose not to compete in it. If others feel like it is worth it, I am happy to chime in and contribute, but I currently do not consider to lead such an effort (more on that under Development & Maintenance).

Scope

The second important discussion is the scope of the work. An option that would be very unattractive in my view is to simply support all functionality we get mostly for free simply by adding those new target platforms and a bit of bootstrap logic for the CLI app itself while stripping off all modules that don't work on Native out of the box. The huge downsides with that would be twofold: the documentation which already needs to distinguish between library API and sbt plugin would receive further fragmentation by many features not being supported on Native or Node or in a very different way. Secondly competing with any comparable tool becomes even harder when we only offer a small subset of functionality. I would want to avoid increasing the maintenance burden by cross publishing when we realistically would only reach a small audience.

For this reason I feel like it's appropriate to hijack this proposal a bit for a wider discussion of the Laika road map and I hope you don't mind the wall of text that will follow. In my view aiming for feature parity with what the sbt plugin offers plus filling functional gaps compared to competing native apps in this space would be the only viable approach. Which turns this idea into a fairly ambitious endeavour. I would actually love to change the title of this proposal from "Publish..." to "Develop..." if you guys don't mind as that would more accurately reflect the scope of the work. I haven't thought about every module and feature in detail, but I'll list a few things that spontaneously come to mind just as an example for the challenge that would lie ahead.

Helium Configuration

In contrast to the configuration in Laika Core which supports both, HOCON and programmatic configuration for (I think) all features out of the box, that is currently not the case for Helium. I think it is feasible to add that support, but it's a non-trivial size of work. The DX of the syntax for this vast set of options would need to be carefully considered and HOCON decoders would need to be written (which is a lot of boilerplate as Laika does not support macros like some JSON libraries do - this might change when everyone is on Scala 3, but until I'd prefer not to implement those twice for Scala 2 and 3)

PDF Support

As you might be aware, PDF support in Laika is based on the Apache FOP project which in turn is based on the XSL-FO format which appears to be a legacy format right now. I would have preferred to avoid the format, but there are no comparable libraries for the JVM and not requiring any external tools was a primary goal from the beginning. Apache FOP will never work on Scala.js nor Scala Native, so PDF support would need to be re-implemented from scratch based on tooling available for the relevant platforms. One option would be to add a LaTeX renderer to Laika (a non-trivial effort on its own) and then rely on all the existing tooling around that format (the result would probably look even better than what Apache FOP produces). Another option would be to use HTML as the interim format, but that could not simply use Laika's existing renderer as it would need to deal with features only supported for PDF (manual page breaks, footnotes, configurable bookmarks and typographic controls like preventing headlines on the bottom of the page) and I am not sure whether PDF tooling accepting HTML as input format would support all these. You might feel tempted to say: let's skip PDF support then, but that brings the other downsides listed above: an inconsistent feature set and offering less than competitors.

EPUB Support

This might be significantly easier than PDF support, but I ultimately don't know. The big advantage here is that everything non-portable is concentrated in a single Laika class that depends on the JDKs ZipOutputStream (EPUB is essentially ZIP plus XML metadata). Whatever alternative we would be using needs to support a way to specify the compression per file (EPUB requires the first file to be written uncompressed).

Extensibility

To my knowledge, most alternatives in this space support a way of installing 3rd-party extensions. On top of that, the large set of extension points of Laika is an aspect that distinguishes Laika from some other solutions, so the idea of producing a CLI app that closes this door feels like stripping off a large portion of what makes Laika attractive. To achieve feature parity with the JVM version and enable community plugins, there would need to be a way to compile and load extensions. The downside, apart from the associated complexity is that the language would be Scala. When trying to target communities outside of Scala, this might be a showstopper for some. If a CLI application would see any significant adoption at all, I can already sense that some might start requesting Java APIs. But given that most extension points center around writing partial functions for AST nodes which are case classes, I'm not sure this could be added in any elegant way.

HOCON

This is another piece in the puzzle where we offer something alien when targeting developers beyond Scala. I'm fairly sure that at some point people would ask for YAML support and that would be very difficult to integrate in Laika. Wrapping a YAML parser is the easy step, but we would need to translate the internal format of the result to Laika's internal HOCON format to preserve the hierarchical nature of Laika's configuration with its cross-reference support. YAML is also not a good fit for some use cases like directive attributes (a format with significant whitespace does not work well in an inline scenario).

CommonMark

Another gap to competitors that most users and contributors are probably not aware of is that Laika still builds on the original, classic Markdown definition. But today, in most cases where people talk about Markdown they actually mean CommonMark. To provide some context for those who did not follow these developments, the CommonMark spec was a joint effort to add detail to many of the vague or incomplete syntax descriptions of the original Markdown definition as well as a comprehensive test suite. Unfortunately the original author of Markdown did not agree to the name Markdown being used, hence the new name CommonMark. But in practice everyone still calls this Markdown. For Laika the plan is to integrate the CommonMark test suite of 600+ tests to match what other tools offer by now. Most users won't notice the difference, as using Markdown with GitHubFlavor today will get you very close to what is now called CommonMark and the tests which will be red will most likely center around more esoteric edge cases. But I still feel this is an area where catching up to comparable toolkits is necessary and the work for this is currently scheduled for the 1.1 time frame.

Syntax Highlighting

This is another area where some catch up would be required when targeting a wider audience. Laika's built-in highlighter still lacks support for some of the major languages like C++, C#, Ruby, Rust, Go. Options here would be, a) adding those highlighters to the integrated support, which is the ideal solution as it makes the syntax available as AST nodes and thus opens it for customizations and makes it work for all output formats, b) offer some JavaScript solution out of the box, which would have the downside of not working for PDF or C) integrate with some native solution for which I have not done any research yet.

Search

This really hasn't been on my radar so far, and it's also not getting requested frequently, but many alternative solutions have some support for searching. This is relevant for HTML output only, as EPUB and PDF come with search functionality in the respective readers. I did some brief research a few years ago, but did not find any attractive solution. Most JavaScript search engines I could find where either very limited in functionality or no longer actively maintained or, in most cases, both.

Parallel Execution

The transformation time for Laika benefits from the fact that there is only a very brief bottleneck in the middle of the transformation where the AST processing happens in a single thread (to resolve cross-references, produce tables of contents, etc.), the first and last phases of the process (parsing and rendering together with I/O) happen for all documents in parallel. I don't know whether parallel execution can be kept for Node or Native, but if not I could imagine that sequential execution might result in performance that does not match that of other native tools.

Development & Maintenance

The third and final topic I'd like to discuss is how such a solution would be developed and maintained. I hope that I could convince you with my thoughts above that providing some sort of minimal POC would most likely just increase the maintenance burden for little gain and probably little adoption in the community. But if we agree to aim a bit higher the next question is who would be doing this work and when.

At this point in time I personally have no capacity myself to increase the scope of the work I do with Laika. The coming months my focus will be the 1.0 milestone series and a bit of 0.19 maintenance. For 1.0 my sole focus is improving binary compatibility. There will be a series of PRs for each package focusing around two areas: reducing the surface of the public API which is currently way too large and replacing all case classes by other constructs (apart from those forming the AST which can stay for a number of reasons).

Given all these constraints and the fact that at this point in time we don't know how much interest in the community we can generate for this effort and that the amount of work is significant for producing a viable product, I'm somewhat sceptical about the idea at least for the coming months. But this view could change in the future if several criteria around the Laika project and this proposal change over time:

increased adoption and participation
one or more additional admin-level maintainers who can merge PRs and cut releases
a really significant number of requests for a CLI application coming from the community
a shared vision for such an application (mine is primarily achieving feature parity, otherwise I'm quite open-minded)
someone other than me stepping up to lead such an effort with me acting merely as contributor and reviewer
filling functional gaps to competitors before increasing the maintenance burden by cross-publishing
defer cross-publishing to later phases of the 1.0 milestone series to keep the maintenance branch aligned as long as possible (unless this becomes a blocker for actual CLI development commencing)
not cross-publish 1.0 at all if these criteria are not met around the time we get close to the first RC

Finally, while the proposal addresses both, Native and Node, I think the criteria for committing to this effort have to be met for both individually, and doing one of them should not automatically imply doing the other.

I'm happy to keep this open for a long time, to allow for further discussion, collecting ideas, and as a place where users can provide feedback. So if anyone has interest in a CLI application or would like to contribute to the development, please chime in and describe your use case. What would you do with it and why would you prefer it over any of the other native or Node.js toolkits which are available today?

armanbilge commented 1 year ago

Firstly, thank you for taking the time to consider this and write such a detailed response ❤️ besides the specific points of this proposal, it is really great to get insight into where things are, and where things are headed.

Overall, I agree with your analysis above, both on the audience and effort required (and changed "publish" to "develop" :)

I'm going to read through a second time and respond to a few points below.

The big advantage here is that everything non-portable is concentrated in a single Laika class that depends on the JDKs ZipOutputStream (EPUB is essentially ZIP plus XML metadata)

Ideally this would lean on fs2.compression which already cross-compiles on JVM/JS/Native. Unfortunately fs2.compression currently only supports Gzip and Inflate/Deflate, but adding Zip would be a good (and requested) feature anyway.

Search This really hasn't been on my radar so far, and it's also not getting requested frequently, but many alternative solution have some support for searching.

Well, we are getting plenty of requests downstream, for searchable project documentation :) @valencik is currently exploring this arena.

I don't know whether parallel execution can be kept for Node or Native, but if not I could imagine that sequential execution might result in performance that does not match that of other native tools.

Parallel execution can definitely be kept on all platforms. Applications/libraries using Cats Effect etc. combinators for concurrency and parallelism Just Work:tm: with identical semantics.

On Node.js, I/O tasks can run in parallel, but there is only one thread for compute tasks.

Scala Native 0.4 is currently single-threaded, but the 0.5 series in development already supports multi-threading. This is expected to land sometime in the next few months-ish. Suffice to say, it will match the JVM on all counts.

But even if others step in and do all work without the need for me to contribute or review (I'm currently unable to even review PRs for Scala Native as my knowledge about it is exactly zero)

I'm always happy to review platform- and build-related PRs. I know it doesn't really address the issue you raise here, just saying :)

one or more additional admin-level maintainers who can merge PRs and cut releases

Along these lines, I'd love to see Laika become a Typelevel-org project, if this is something that interests you. This would secure multiple admins, who can maintain (or delegate).

ideally not starting to cross-publish during the 1.0 milestone series when there is a maintenance branch

In my experience cross-building libraries, it is generally much better to figure out the cross-building before 1.0 and compatibility constraints kick in, since dealing with (even subtle) platform differences becomes a lot more painful. In fact, knowing that you are working towards 1.0 was one of my motivations for opening this discussion now, before it was too late :)

The intent is not to invalidate your point but better illustrate the interactions of constraints. Basically, establishing an API with long-term binary-compatibility is somewhat coupled to cross-platforming, so if both are goals they should not be done separately.

jenshalm commented 1 year ago

Thank you for providing all these pointers. I'll comment on two of them here.

one or more additional admin-level maintainers who can merge PRs and cut releases

Along these lines, I'd love to see Laika become a Typelevel-org project, if this is something that interests you. This would secure multiple admins, who can maintain (or delegate).

Happy to discuss this if there is interest. I just have no idea what this means in practical terms, so someone would need to brief me a bit on this... 🙂

ideally not starting to cross-publish during the 1.0 milestone series when there is a maintenance branch

In my experience cross-building libraries, it is generally much better to figure out the cross-building before 1.0 and compatibility constraints kick in, since dealing with (even subtle) platform differences becomes a lot more painful. In fact, knowing that you are working towards 1.0 was one of my motivations for opening this discussion now, before it was too late :)

The intent is not to invalidate your point but better illustrate the interactions of constraints. Basically, establishing an API with long-term binary-compatibility is somewhat coupled to cross-platforming, so if both are goals they should not be done separately.

EDIT: I rephrased the last bullet and added another one, I think I wasn't clear enough. Sorry for the confusion.

armanbilge commented 1 year ago

I just have no idea what this means in practical terms, so someone would need to brief me a bit on this...

@valencik can expand more, but by "org level" project I mean moving the project under the github.com/typelevel org (and possibly publishing under the org.typelevel groupid). IMO Laika was already a great candidate for membership, and now that it is core infrastructure it makes even more sense. Since you are looking for more admin-level maintainers, this seems like it could be a good fit for you as well :)

not cross-publish 1.0 at all if these criteria are not met around the time we get close to the first RC

Yup, we are very much in agreement on this :) without a CLI I just don't see a compelling usecase for laika-io on Native/JS, although users should certainly chime in if they do!

jenshalm commented 1 year ago

I just have no idea what this means in practical terms, so someone would need to brief me a bit on this...

@valencik can expand more, but by "org level" project I mean moving the project under the github.com/typelevel org (and possibly publishing under the org.typelevel groupid). IMO Laika was already a great candidate for membership, and now that it is core infrastructure it makes even more sense. Since you are looking for more admin-level maintainers, this seems like it could be a good fit for you as well :)

I think that might be a good step to consider. Changing the group ID for 1.0 would also be ok, that's a step where users will not be caught by surprise when major changes are made. Shall we move the conversation somewhere else? It does not really belong into this ticket.

valencik commented 1 year ago

Hey @jenshalm, yeah, let's move this discussion elsewhere. The right place is likely an issue on the governance repo: https://github.com/typelevel/governance. I am collecting some thoughts from the other steering committee members on a non-legalese description of what practically happens when a project joins as either an organization project or affiliate project. I hope to have more to share shortly.

So feel free to create an issue on the governance repo. You can create a blank issue, you don't have to follow the project submission template for this discussion. Or, once I have some more feedback from the team I'll create one and tag you :)

valencik commented 1 year ago

Hey @jenshalm just wanted briefly followup to say that we have some description of the practical points on joining Typelevel available here: https://github.com/typelevel/governance/#why-join-typelevel

And either on that repo or in Discord would be a fine place to start a further discussion if you have anymore questions :)

jenshalm commented 1 year ago

Thank you for the pointers, I'll probably write up a little proposal sometime in the coming weeks when I return to spending a bit more time on this project.

typelevel / Laika

Develop a Laika CLI for Node.js and Native #360

Proposal

Target Audience

Scope

Helium Configuration

PDF Support

EPUB Support

Extensibility

HOCON

CommonMark

Syntax Highlighting

Search

Parallel Execution

Development & Maintenance