Feature request: reStructuredText / Sphinx frontend for Haddock

jml commented 7 years ago

I come from a Python background, where I've grown to love Sphinx and its associated primary markup format, reStructuredText.

I'd really like to be able to use it to write my Haskell API docs. Haddock is great, but it's markup syntax is idiosyncratic, and I'm always consulting the reference material.

Also, I notice that Haddock generates its own documentation with Sphinx.

prikhi commented 7 years ago

I would love this as well, but I think the proper way to implement this would be to add Haskell as a Sphinx Domain (maybe with domaintools?) and then write something similar to the autodoc or sphinx-js extensions to pull the docstrings from source code. Probably easiest to use haskell-src-exts and send it to python via JSON(seems like parser-helper provides ToJSON instances for an older version of haskell-src-exts).

jml commented 7 years ago

@prikhi That sounds like a useful approach, and would be an improvement, I think.

However, if a package were to use rST markup and this means of generating HTML documentation, it would still render very poorly on Hackage, which is where most people go to browse Haskell API documentation.

hvr commented 7 years ago

I'm quoting here the comment I made on #618 as the suggestion to add support for other markups is a recurring wish:

...it may be useful to review why attempts to add support for Markdown have failed in the past (see https://github.com/haskell/haddock/issues/244#issuecomment-261019664) and whether rST would have a better chance than Markdown.

jml commented 7 years ago

… which I still strongly agree with. It would be foolish to commence work on this without learning from what has gone before, and without some sort of support from the people actually maintaining Haddock today.

I have at least skimmed the blog post claiming Markdown in Haddock cannot happen—although it deserves more thorough review.

My sense is that:

most of the objections (e.g. backticks, headings) are specific to Markdown, and are addressed by reST
many of the remaining objections are about features of Haddock that are less commonly used (e.g. LaTeX backend)

I'm not 100% sure this is a fair reading. As I said, it deserves a more thorough review than I've given it.

One of the developers of Read the Docs makes similar objections re MarkDown, which is why I believe reST would be a better choice.

Although I do genuinely hope to be able to write my Haskell API documentation with reST markup, at the very least, I believe the output of this ticket should be a FAQ entry which condenses the discussion from the mailing list and blog posts into a paragraph or two.

ghost commented 7 years ago

I'm not sure how thorough such a review should be, but I'll give it a shot. I'll start with an interesting point in the conclusion from Mateusz Kowalczyk's blog post Why Markdown in Haddock will not happen:

Why is this the case? It seemed like such a good idea to a large amount of people when proposals were initially being presented. Even if you didn't like Markdown, there were plenty of other calls for reST and Wiki syntax. It was going to be great: people don't have to learn Haddock syntax and can concentrate on writing code more. Why can't we have things like horizontal rules or inline HTML? I think the first sentence in the Markdown documentation after the introduction explains it pretty well: "Markdown's syntax is intended for one purpose: to be used as a format for writing for the web.". As it turns out, Haddock is not 'the web'.

Unlike Markdown, reStructuredText was and is primarily designed for technical documentation. It's part of the docutils package for Python and its popularity is mainly due to prolific use in docstrings and PEPs. Markdown could be used for RFCs and PEPs, but as demonstrated in many other cases, API documentation is a very different domain. reST is explicitly designed for this domain.

I'll continue with a bunch of questions that I extracted from Kowalczyk's first e-mail and blog post, in particular the ones where Markdown failed.

Does it have a specification?

From the first e-mail:

There are issues with using Markdown even before we attempt to use it for Haskell documentation:

There exists no formal specification or semantics. It would seem that a significant number of Markdown parsers are creating by reverse engineering an already existing parser. This is bad because we end up propagating the bugs and workarounds around ambiguity that the original parser has.

As a follow-up to the previous point, the (vanilla) Markdown is ambiguous and there is nothing to resolve it. As Richard A. O'Keefe pointed out, there exist situations where it's not possible to infer the semantics of Markdown from its official implementation and the result is parser/writer-specific [6].

This is a significant issue with Markdown. There are quite a few implementations, and due to the lack of a specification, each implementation is as valid as the other. The result is that there are tons of quirks, and no non-trivial document Markdown looks the same after processing it with two different processors. Even GitHub Flavoured Markdown has its own quirks (line breaks are always rendered), and they're different from BitBucket's quirks (nested lists must be indented deeper than you think). This is incredibly annoying and makes Markdown unreliable, as pointed out in the blog post:

In fact, it’s incredibly easy to see that this [the introduction of emphasis markup, ed.] causes problems just with the Markdown syntax, without even beginning to worry about the Haddock side of things!

reStructuredText, on the other hand, has very rigorous rules for parsing emphasis markup, and seems to leave very little up for guessing or debate.

Does it support Literate Haskell?

Markdown doesn't support Literate Haskell as-is. Bird tacks would be interpreted as block quotes, \begin{code} would not be recognised as a fence for code.

reStructuredText doesn't support Literate Haskell as-is. \begin{code} would cause similar problems as in Markdown. Bird tacks could be used in reST; literal blocks may either be indented or "quoted", and the "greater than"-symbol is among the many characters that may be used for the quoted style. It would however be quite a hackish solution, as the quoting characters are considered to be part of the literal text from the point of view of the document generator.

The alternative would be to introduce reStructuredText directives for these code and spec blocks. This would require that the implementation of an unlit program includes a parser for the logical structure of reST.

Does it play nice with the syntax of Haskell?

Markdown has a huge problem with how it treats headers. Even when you're not considering the context of using it for technical documentation. But [MacFarlane][MacFarlaneCppExample] gives the prime example of why Markdown's headers can (and will) clash with Haskell's syntax:

module MyModule
{-
# Introduction

This is my module
-}
where
import System.Environment

main = getArgs >>= print

When compiling this with the C preprocessor option on, this will cause errors. You don't have much options in Markdown: you can have two levels of headers, but after that, you'd have to use number signs.

reStructuredText only supports one style for headings and has simpler rules for determining which heading level you're on. If we implement reST in Haddock, we should also formulate a suggested convention for heading styles (but not that one) that avoids the use of problematic characters.

A problem that exists with both markup systems is that the backtick is used to bracket either monospaced text (Markdown) or interpreted text (reST), while it also has a meaning in Haskell (to use a variable as an operator):

{-|
Expresses interest.

Note that the expression `i \`like\` you` may cause unexpected side-effects.
-}
like :: Who -> Whom -> Whomst'd've
like = magic

Haddock's syntax kind of avoids this awkwardness by using @ for this purpose.

Is it easy to implement?

Kowalczyk's e-mail refers to two Haskell implementations for parsing Markdown, one more efficient than the other, so I guess that's something Markdown has going for it.

This is where the reStructuredText situation is lacking. The only implementation of a reST parser in Haskell I can find is Pandoc. It's not designed to be a perfect fit for the format itself, and the implementation doesn't look production-ready to me. Judging from the code comments, it also doesn't implement all standardised features, nor does it implement the inline markup rules correctly (but apparently "good enough for most purposes").

However, reST is quite a large specification, so reimplementing from scratch would be quite a task. It would get rid of unspecified docutils-compatibility behaviour, though (such as unspecified roles). It would also be necessary in order to properly access more complex features of reST, such as management of substitution definitions and definitions for roles and directives.

Does it match Haddock?

This question has two parts. First, let's compare how well reST and vanilla Markdown cover Haddock, by taking the DocH data type as our reference.

`DocH`	reST	Markdown
`DocEmpty`	:zap:	:zap:
`DocAppend`	:zap:	:zap:
`DocString`	:zap:	:zap:
`DocParagraph`	:heavy_check_mark:	:heavy_check_mark:
`DocIdentifier`	:heavy_exclamation_mark: with a custom role	:x:
`DocIdentifierUnchecked`	:heavy_exclamation_mark: with a custom role	:x:
`DocModule`	:heavy_exclamation_mark: with a custom role	:x:
`DocWarning`	:heavy_check_mark: with a specified directive	:x:
`DocEmphasis`	:heavy_check_mark:	:heavy_check_mark:
`DocMonospaced`	:heavy_check_mark:	:heavy_check_mark:
`DocBold`	:heavy_check_mark:	:heavy_check_mark:
`DocUnorderedList`	:heavy_check_mark:	:heavy_check_mark:
`DocOrderedList`	:heavy_check_mark:	:heavy_check_mark:
`DocDefList`	:heavy_check_mark:	:heavy_exclamation_mark: with HTML
`DocCodeBlock`	:heavy_check_mark:	:heavy_check_mark:
`DocHyperlink`	:heavy_check_mark:	:heavy_check_mark:
`DocPic`	:heavy_check_mark:	:heavy_check_mark:
`DocMathInline`	:heavy_check_mark: with a specified role	:x:
`DocMathDisplay`	:heavy_check_mark: with a specified directive	:x:
`DocAName`	:heavy_check_mark:	:heavy_exclamation_mark: with HTML
`DocProperty`	:heavy_exclamation_mark: with a custom directive	:x:
`DocExamples`	:heavy_check_mark:	:x:
`DocHeader`	:heavy_check_mark:	:heavy_check_mark:

Clearly, reST will need some roles, but that's what it's designed for. You'll often see quite a few Python-specific roles sprinkled throughout docstrings in Python code. This will address one of the most glaring disadvantages of Markdown in the context of technical documentation. Furthermore, reST specifies support for warning blocks, LaTeX-style math support, and doctests, all of which are absent in vanilla Markdown.

Now, we should consider how the document model of reST and vanilla Markdown map to Haddock. As for Markdown, the table above is actually a lie. You can do everything that's possible in HTML in Markdown, and nothing stops you from using a bunch of spans with attributes that distinguish between identifiers and modules. It'll just be a huge pain to write and parse. Arbitrary HTML is kind of meaningless for Haddock. reST has similar problems, however. There are some specified structures (e.g. tables) that can't be applied in Haddock as it is. It's also unfortunate that the reStructuredText Markup Specification requires all directives from the reference implementation (specified in reStructuredText Directives) to be available. It's not entirely clear if it also requires the roles defined in the reference implementation (specified in reStructuredText Interpreted Text Roles) to be always pre-defined. So both of these markup languages are more powerful than Haddock can handle; as Kowalczyk puts it, neither system is "a 1:1 fit for Haddock".

hvr commented 7 years ago

/cc @Fuuzetsu

Fuuzetsu commented 7 years ago

If someone wants to implement it as optional syntax (say using OPTIONS_HADDOCK, it's ok with me. Main issue is doing it, as you mention rst is not so easy and there is a big limitation on dependencies. A solution would be making it optional compile and dep. Just come up with somewhat sane mapping of things into Haddock things or even don't and introduce new one with new renderer as long as old syntax keeps working and rendering as expected.

jml commented 7 years ago

@Fuuzetsu cool, thanks!

there is a big limitation on dependencies

What's the limitation, specifically?

alexbiehl commented 7 years ago

@jml as haddock is part of GHC distribution you need to restrict yourself on dependencies carried with GHC.

As a workaround you could bundle the dependencies source with haddock as we currently do for attoparsec in haddock-library. But note that this imposes a maintenance burden as we would need to update the dependency code manually.

ygale commented 7 years ago

How hard would it be to abstract the "haddock protocol" so that an arbitrary external program that implements the protocol could be used, similar to the way GHC abstracts the notion of a preprocessor or a literate processor?

Obviously this protocol will be more complex than for a preprocessor. There needs to be a way for the external program to "query" GHC for lookups of symbols and modules (as in the single-quote and double-quote haddock markup).

For example, perhaps the external program would have two modes, an "initial parse" mode and an "output" mode. The top-level syntax for marking comments as haddock comments, such as "-- |" etc., would be standard for all processors. In "initial parse" mode GHC sends the raw haddock comments, and the program responds with a list of symbol lookups it needs. Then GHC calls the program again in "output" mode with the raw haddock comments again plus a map of the symbol lookup results.

With this kind of abstraction, it would be possible to use sphinx itself by writing a thin wrapper. Or other markup systems, for example when Haskell is co-existing with other languages in a specialized IDE.

Fuuzetsu commented 7 years ago

@ygale you just described what Haddock does with GHC API. There's nothing magic about Haddock; GHC spits out AST with -- | &c. parsed out then we parse the actual content and ask GHC to rename (lookup) symbols between quotes &c. Presumably getting sphinx to use GHC API directly is not viable however so think wrapper is likely best solution.

ygale commented 7 years ago

@Fuuzetsu I'm not suggesting re-implementing Haddock. I'm talking about abstracting out of Haddock just the part that renders the markup in comment snippets. The wrapper would be a very thin wrapper. It would only need to deal with a simple flat list of comment texts; not an AST, and not the GHC API. And then yes, it would become possible to use sphynx itself, or any other markup rendering engines.

harpocrates commented 6 years ago

@Fuuzetsu Could you clarify what you meant in https://github.com/haskell/haddock/issues/570#issuecomment-310937851? Here is what I hope you are saying:

we add a flag reStructuredTextParser or flag markdownParser in the Cabal files, set to False by default
behind this flag, we can add some more dependencies (hopefully still pretty light, but slightly more than GHC's dependencies) that implement some parsing
when compiled with this flag, Haddock would support options (which one could pass in though OPTIONS_HADDOCK) to specify that the documentation is to be parsed not in the usual Haddock format, but as reStructuredText or Markdown instead.

I'm trying to breathe life into this issue because it seems that the primary objection to Haddock that came up in the recent Reddit thread was its non-standard syntax. Hopefully, the above would allow us to experimentally support more formats.

alexbiehl commented 6 years ago

@harpocrates The idea would be this: Next to the haddock markup parser lives a new reST parser. When documenting a module you can use OPTIONS_HADDOCK to activate the reST parser:

{-# OPTIONS_HADDOCK reST #-}
module SomeModule (...) where 
...

This way you can switch to reST on a per-module basis. Of course you would want to also add a flag to haddock which tells it to treat every module with the reST parser.

harpocrates commented 6 years ago

@alexbiehl I agree with the per-module OPTIONS_HADDOCK part. My question was about the possibility of adding a conditional dependency to Haddock. Something like this in the cabal files:

...

flag reSTParser
    description:     Adds a `reST` option which can be enabled on a per-module basis to
                     write docs in the reStructuredText format. 
    default:         False

...

  build-depends:    {- the existing dependencies - very limited by GHC -}

  if flag(reSTParser)
    build-depends:  {- new dependencies to help with parsing reStructuredText - not
                    compiled by default, and not limited by GHC -}

Writing up new parsers for reStructuredText and Markdown from scratch, just for Haddock, sounds rather difficult. Besides I'm already imagining the mountain of bugs that would be opened to support every last feature of those formats... That's not what Haddock is supposed to focus on!

alexbiehl commented 6 years ago

I should read more carefully...

I agree on that part. If you can find one which only depends on parsec/attoparsec we can talk about including in haddock directly.

In the long term having different haddock versions with different features isn't much of a help either.

Alec Theriault notifications@github.com schrieb am Do., 14. Dez. 2017, 15:44:

@alexbiehl https://github.com/alexbiehl I agree with the per-module OPTIONS_HADDOCK part. My question was about the possibility of adding a conditional dependency to Haddock. Something like this in the cabal files:

...

flag reSTParser description: Adds a reST option which can be enabled on a per-module basis to write docs in the reStructuredText format. default: False

...

build-depends: {- the existing dependencies - very limited by GHC -}

if flag(reSTParser) build-depends: {- new dependencies to help with parsing reStructuredText - not compiled by default, and not limited by GHC -}

Writing up new parsers for reStructuredText and Markdown from scratch, just for Haddock, sounds rather difficult. Besides I'm already imagining the mountain of bugs that would be opened to support every last feature of those formats... That's not what Haddock is supposed to focus on!

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/haskell/haddock/issues/570#issuecomment-351730281, or mute the thread https://github.com/notifications/unsubscribe-auth/AByiiTN6Og36V0keViUNk4uW8CNuu8rXks5tATQzgaJpZM4LjNkB .

domenkozar commented 5 years ago

This project could be done in a few steps:

Implement Sphinx domain for Haskell (this would be immediately useful for manually writing documentation in Servant/GHC, etc)
Implement reST generation using the Haskell domain that can be included into sphinx projects
Full Sphinx+haddock integration

vsoch commented 5 years ago

Huge +1 on this issue, sphinx would allow for eventual building on readthedocs and markdown (even better!) will render beautifully into jekyll sites (static on Github pages). I'm not incredibly experienced with haddock or haskell, but if there are other ways I can help I can offer! I'm watching the issue so I'll keep updated.

rowanG077 commented 5 years ago

I would really like this! Is this still on the radar?

theobat commented 5 years ago

Implement Sphinx domain for Haskell (this would be immediately useful for manually writing documentation in Servant/GHC, etc)

I'm working on this here for now if anyone is interested. It's quite hard to think about all that's needed for haskell as the language and all its extensions is quite a large API surface. This is just a sketch following the examples of the raw page for js and python

haskell / haddock