informalsystems / themis-tracer

A tool for managing complex contexts for developing critical systems
Apache License 2.0
4 stars 0 forks source link

Drop dependency on pandoc #33

Open shonfeder opened 3 years ago

shonfeder commented 3 years ago

As reported on #65, this has renewed urgency, since the pandoc_ast library is no longer compatible with current versions of pandoc, and isn't apparently actively maintained with the current state of pandoc.

I should note: @thanethomson advocated for putting in the time to avoid this brittle dependency chain from the start, but I (incorrectly) argued for postponing the cleaner implementation for the sake of moving quickly. In retrospect, I think that would have been the right call at the time. Fortunately it's not too late to rectify :)

thanethomson commented 3 years ago

No, I'd say moving quickly was the right decision at the time to depend on pandoc :grin: The PoC did what we expected and needed it to do at the time, and now it seems like it's time to move on to an implementation that better suits our current situation.

What're your thoughts, @shonfeder, on how to go about doing this?

shonfeder commented 3 years ago

Perhaps you're right now (as you were then ;).

Here's my current thinking:

Options

  1. Use comrak.
  2. Extend pulldown-commonmark to support definition lists (as per https://github.com/raphlinus/pulldown-cmark/issues/67).
  3. Write a home-rolled extractor that doesn't parse full markdown but only extracts logical units (just the shape of definition lists we need).

Preliminary Evaluation

  1. comrak
    • Pros:
      • The library is stable and has support for parsing out definition lists.
    • Cons:
      • The support for def lists is limited: it does not fully support PHP Extra syntax, let alone Pandoc's extension. It would require rewiring all existing tagged specs to add a newline between the term and the definition, but even then, it lacks support for, e.g., block formatting inside of definitions (it doesn't render block quotes inside of definition lists correctly, for instance).
      • The API for dealing with it's representation of markdown is quite cumbersome (see https://docs.rs/comrak/0.9.1/comrak/index.html). I am worried that the time saved in a having a drop-in placement will be paid out in the friction of trying to extract elements from its representation.
  2. pulldown-commonmark
    • Pros:
      • The most popular and stable markdown library in the Rust ecosystem. (1.1k stars to comrak's 442)
      • Used in mdBook, which we are planning to use for generating the site to consolidate units (see #22). This would simplify our dependencies and likely improve interop.
      • An elegant and simple interface for transformations and extractions. (See https://github.com/raphlinus/pulldown-cmark#why-a-pull-parser)
      • Provides easy access to the source map, making it trivial for us to extract the line number location for each tag (we'll want this for pinpointing location on non-linkified specs).
    • Cons:
      • No support for definition lists at the moment.
  3. Home-rolled extraction of logical units
    • Pros:
      • might be quick to implement (tho I'm doubtful on this, given the potential gotchas).
    • Cons:
      • Shortsighted: We'll need to parse markdown anyhow to deal with linkification (see #61) and will likely want to do deeper analysis of specifications or extend the places logical units can appear before long.
      • Fragmentary: it will give us a hardly functioning fragment of Markdown, and provide no benefit to the wider ecosystem or to the robustness of this tool.

IMO, (2) is a pretty clear winner here. In addition to providing the most robust answer to our needs, it would be a non-trivial contribution too the rust ecosystem (judging from comments on the issue).

Scoping (2)

Time Estimate

I estimate a time cost of 2 to 5 days (leaving ample room for unexpected complications).

It is important to note the following: for the sake of unblocking development here, it suffices to implement support for definition lists in a fork we can pin. So even if there is a delay in getting the feature merged, we can begin taking advantage of the feature as soon as the minim functionality is in place. This derisks any concern about responsiveness from the maintainers (tho they have already proven to be quite responsive).

shonfeder commented 3 years ago

I just thought of an easy stop-gap solution, which doesn't drop the pandoc dep, but will enable dropping pandoc_ast and allow me to fix the immediate problem without the risk of blowing the deadline for my deliverables this quarter:

I'll plan on addressing (2) in Q2, if possible. But go with this option for an immediate fix.