Open jml opened 7 years ago
I would love this as well, but I think the proper way to implement this would be to add Haskell as a Sphinx Domain (maybe with domaintools?) and then write something similar to the autodoc or sphinx-js extensions to pull the docstrings from source code. Probably easiest to use haskell-src-exts and send it to python via JSON(seems like parser-helper provides ToJSON
instances for an older version of haskell-src-exts
).
@prikhi That sounds like a useful approach, and would be an improvement, I think.
However, if a package were to use rST markup and this means of generating HTML documentation, it would still render very poorly on Hackage, which is where most people go to browse Haskell API documentation.
I'm quoting here the comment I made on #618 as the suggestion to add support for other markups is a recurring wish:
...it may be useful to review why attempts to add support for Markdown have failed in the past (see https://github.com/haskell/haddock/issues/244#issuecomment-261019664) and whether rST would have a better chance than Markdown.
… which I still strongly agree with. It would be foolish to commence work on this without learning from what has gone before, and without some sort of support from the people actually maintaining Haddock today.
I have at least skimmed the blog post claiming Markdown in Haddock cannot happen—although it deserves more thorough review.
My sense is that:
I'm not 100% sure this is a fair reading. As I said, it deserves a more thorough review than I've given it.
One of the developers of Read the Docs makes similar objections re MarkDown, which is why I believe reST would be a better choice.
Although I do genuinely hope to be able to write my Haskell API documentation with reST markup, at the very least, I believe the output of this ticket should be a FAQ entry which condenses the discussion from the mailing list and blog posts into a paragraph or two.
I'm not sure how thorough such a review should be, but I'll give it a shot. I'll start with an interesting point in the conclusion from Mateusz Kowalczyk's blog post Why Markdown in Haddock will not happen:
Why is this the case? It seemed like such a good idea to a large amount of people when proposals were initially being presented. Even if you didn't like Markdown, there were plenty of other calls for reST and Wiki syntax. It was going to be great: people don't have to learn Haddock syntax and can concentrate on writing code more. Why can't we have things like horizontal rules or inline HTML? I think the first sentence in the Markdown documentation after the introduction explains it pretty well: "Markdown's syntax is intended for one purpose: to be used as a format for writing for the web.". As it turns out, Haddock is not 'the web'.
Unlike Markdown, reStructuredText was and is primarily designed for technical documentation. It's part of the docutils
package for Python and its popularity is mainly due to prolific use in docstrings and PEPs. Markdown could be used for RFCs and PEPs, but as demonstrated in many other cases, API documentation is a very different domain. reST is explicitly designed for this domain.
I'll continue with a bunch of questions that I extracted from Kowalczyk's first e-mail and blog post, in particular the ones where Markdown failed.
From the first e-mail:
- There are issues with using Markdown even before we attempt to use it for Haskell documentation:
- There exists no formal specification or semantics. It would seem that a significant number of Markdown parsers are creating by reverse engineering an already existing parser. This is bad because we end up propagating the bugs and workarounds around ambiguity that the original parser has.
- As a follow-up to the previous point, the (vanilla) Markdown is ambiguous and there is nothing to resolve it. As Richard A. O'Keefe pointed out, there exist situations where it's not possible to infer the semantics of Markdown from its official implementation and the result is parser/writer-specific [6].
This is a significant issue with Markdown. There are quite a few implementations, and due to the lack of a specification, each implementation is as valid as the other. The result is that there are tons of quirks, and no non-trivial document Markdown looks the same after processing it with two different processors. Even GitHub Flavoured Markdown has its own quirks (line breaks are always rendered), and they're different from BitBucket's quirks (nested lists must be indented deeper than you think). This is incredibly annoying and makes Markdown unreliable, as pointed out in the blog post:
In fact, it’s incredibly easy to see that this [the introduction of emphasis markup, ed.] causes problems just with the Markdown syntax, without even beginning to worry about the Haddock side of things!
reStructuredText, on the other hand, has very rigorous rules for parsing emphasis markup, and seems to leave very little up for guessing or debate.
Markdown doesn't support Literate Haskell as-is. Bird tacks would be interpreted as block quotes, \begin{code}
would not be recognised as a fence for code.
reStructuredText doesn't support Literate Haskell as-is. \begin{code}
would cause similar problems as in Markdown. Bird tacks could be used in reST; literal blocks may either be indented or "quoted", and the "greater than"-symbol is among the many characters that may be used for the quoted style. It would however be quite a hackish solution, as the quoting characters are considered to be part of the literal text from the point of view of the document generator.
The alternative would be to introduce reStructuredText directives for these code
and spec
blocks. This would require that the implementation of an unlit program includes a parser for the logical structure of reST.
Markdown has a huge problem with how it treats headers. Even when you're not considering the context of using it for technical documentation. But [MacFarlane][MacFarlaneCppExample] gives the prime example of why Markdown's headers can (and will) clash with Haskell's syntax:
module MyModule
{-
# Introduction
This is my module
-}
where
import System.Environment
main = getArgs >>= print
When compiling this with the C preprocessor option on, this will cause errors. You don't have much options in Markdown: you can have two levels of headers, but after that, you'd have to use number signs.
reStructuredText only supports one style for headings and has simpler rules for determining which heading level you're on. If we implement reST in Haddock, we should also formulate a suggested convention for heading styles (but not that one) that avoids the use of problematic characters.
A problem that exists with both markup systems is that the backtick is used to bracket either monospaced text (Markdown) or interpreted text (reST), while it also has a meaning in Haskell (to use a variable as an operator):
{-|
Expresses interest.
Note that the expression `i \`like\` you` may cause unexpected side-effects.
-}
like :: Who -> Whom -> Whomst'd've
like = magic
Haddock's syntax kind of avoids this awkwardness by using @
for this purpose.
Kowalczyk's e-mail refers to two Haskell implementations for parsing Markdown, one more efficient than the other, so I guess that's something Markdown has going for it.
This is where the reStructuredText situation is lacking. The only implementation of a reST parser in Haskell I can find is Pandoc. It's not designed to be a perfect fit for the format itself, and the implementation doesn't look production-ready to me. Judging from the code comments, it also doesn't implement all standardised features, nor does it implement the inline markup rules correctly (but apparently "good enough for most purposes").
However, reST is quite a large specification, so reimplementing from scratch would be quite a task. It would get rid of unspecified docutils-compatibility behaviour, though (such as unspecified roles). It would also be necessary in order to properly access more complex features of reST, such as management of substitution definitions and definitions for roles and directives.
This question has two parts. First, let's compare how well reST and vanilla Markdown cover Haddock, by taking the DocH
data type as our reference.
DocH |
reST | Markdown |
---|---|---|
DocEmpty |
:zap: | :zap: |
DocAppend |
:zap: | :zap: |
DocString |
:zap: | :zap: |
DocParagraph |
:heavy_check_mark: | :heavy_check_mark: |
DocIdentifier |
:heavy_exclamation_mark: with a custom role | :x: |
DocIdentifierUnchecked |
:heavy_exclamation_mark: with a custom role | :x: |
DocModule |
:heavy_exclamation_mark: with a custom role | :x: |
DocWarning |
:heavy_check_mark: with a specified directive | :x: |
DocEmphasis |
:heavy_check_mark: | :heavy_check_mark: |
DocMonospaced |
:heavy_check_mark: | :heavy_check_mark: |
DocBold |
:heavy_check_mark: | :heavy_check_mark: |
DocUnorderedList |
:heavy_check_mark: | :heavy_check_mark: |
DocOrderedList |
:heavy_check_mark: | :heavy_check_mark: |
DocDefList |
:heavy_check_mark: | :heavy_exclamation_mark: with HTML |
DocCodeBlock |
:heavy_check_mark: | :heavy_check_mark: |
DocHyperlink |
:heavy_check_mark: | :heavy_check_mark: |
DocPic |
:heavy_check_mark: | :heavy_check_mark: |
DocMathInline |
:heavy_check_mark: with a specified role | :x: |
DocMathDisplay |
:heavy_check_mark: with a specified directive | :x: |
DocAName |
:heavy_check_mark: | :heavy_exclamation_mark: with HTML |
DocProperty |
:heavy_exclamation_mark: with a custom directive | :x: |
DocExamples |
:heavy_check_mark: | :x: |
DocHeader |
:heavy_check_mark: | :heavy_check_mark: |
Clearly, reST will need some roles, but that's what it's designed for. You'll often see quite a few Python-specific roles sprinkled throughout docstrings in Python code. This will address one of the most glaring disadvantages of Markdown in the context of technical documentation. Furthermore, reST specifies support for warning blocks, LaTeX-style math support, and doctests, all of which are absent in vanilla Markdown.
Now, we should consider how the document model of reST and vanilla Markdown map to Haddock. As for Markdown, the table above is actually a lie. You can do everything that's possible in HTML in Markdown, and nothing stops you from using a bunch of spans with attributes that distinguish between identifiers and modules. It'll just be a huge pain to write and parse. Arbitrary HTML is kind of meaningless for Haddock. reST has similar problems, however. There are some specified structures (e.g. tables) that can't be applied in Haddock as it is. It's also unfortunate that the reStructuredText Markup Specification requires all directives from the reference implementation (specified in reStructuredText Directives) to be available. It's not entirely clear if it also requires the roles defined in the reference implementation (specified in reStructuredText Interpreted Text Roles) to be always pre-defined. So both of these markup languages are more powerful than Haddock can handle; as Kowalczyk puts it, neither system is "a 1:1 fit for Haddock".
/cc @Fuuzetsu
If someone wants to implement it as optional syntax (say using OPTIONS_HADDOCK, it's ok with me. Main issue is doing it, as you mention rst is not so easy and there is a big limitation on dependencies. A solution would be making it optional compile and dep. Just come up with somewhat sane mapping of things into Haddock things or even don't and introduce new one with new renderer as long as old syntax keeps working and rendering as expected.
@Fuuzetsu cool, thanks!
there is a big limitation on dependencies
What's the limitation, specifically?
@jml as haddock is part of GHC distribution you need to restrict yourself on dependencies carried with GHC.
As a workaround you could bundle the dependencies source with haddock as we currently do for attoparsec
in haddock-library
. But note that this imposes a maintenance burden as we would need to update the dependency code manually.
How hard would it be to abstract the "haddock protocol" so that an arbitrary external program that implements the protocol could be used, similar to the way GHC abstracts the notion of a preprocessor or a literate processor?
Obviously this protocol will be more complex than for a preprocessor. There needs to be a way for the external program to "query" GHC for lookups of symbols and modules (as in the single-quote and double-quote haddock markup).
For example, perhaps the external program would have two modes, an "initial parse" mode and an "output" mode. The top-level syntax for marking comments as haddock comments, such as "-- |
" etc., would be standard for all processors. In "initial parse" mode GHC sends the raw haddock comments, and the program responds with a list of symbol lookups it needs. Then GHC calls the program again in "output" mode with the raw haddock comments again plus a map of the symbol lookup results.
With this kind of abstraction, it would be possible to use sphinx itself by writing a thin wrapper. Or other markup systems, for example when Haskell is co-existing with other languages in a specialized IDE.
@ygale you just described what Haddock does with GHC API. There's nothing magic about Haddock; GHC spits out AST with -- |
&c. parsed out then we parse the actual content and ask GHC to rename (lookup) symbols between quotes &c. Presumably getting sphinx to use GHC API directly is not viable however so think wrapper is likely best solution.
@Fuuzetsu I'm not suggesting re-implementing Haddock. I'm talking about abstracting out of Haddock just the part that renders the markup in comment snippets. The wrapper would be a very thin wrapper. It would only need to deal with a simple flat list of comment texts; not an AST, and not the GHC API. And then yes, it would become possible to use sphynx itself, or any other markup rendering engines.
@Fuuzetsu Could you clarify what you meant in https://github.com/haskell/haddock/issues/570#issuecomment-310937851? Here is what I hope you are saying:
flag reStructuredTextParser
or flag markdownParser
in the Cabal files, set to False
by defaultOPTIONS_HADDOCK
) to specify that the documentation is to be parsed not in the usual Haddock format, but as reStructuredText or Markdown instead.I'm trying to breathe life into this issue because it seems that the primary objection to Haddock that came up in the recent Reddit thread was its non-standard syntax. Hopefully, the above would allow us to experimentally support more formats.
@harpocrates The idea would be this: Next to the haddock markup parser lives a new reST parser.
When documenting a module you can use OPTIONS_HADDOCK
to activate the reST parser:
{-# OPTIONS_HADDOCK reST #-}
module SomeModule (...) where
...
This way you can switch to reST on a per-module basis. Of course you would want to also add a flag to haddock which tells it to treat every module with the reST parser.
@alexbiehl I agree with the per-module OPTIONS_HADDOCK
part. My question was about the possibility of adding a conditional dependency to Haddock. Something like this in the cabal files:
...
flag reSTParser
description: Adds a `reST` option which can be enabled on a per-module basis to
write docs in the reStructuredText format.
default: False
...
build-depends: {- the existing dependencies - very limited by GHC -}
if flag(reSTParser)
build-depends: {- new dependencies to help with parsing reStructuredText - not
compiled by default, and not limited by GHC -}
Writing up new parsers for reStructuredText and Markdown from scratch, just for Haddock, sounds rather difficult. Besides I'm already imagining the mountain of bugs that would be opened to support every last feature of those formats... That's not what Haddock is supposed to focus on!
I should read more carefully...
I agree on that part. If you can find one which only depends on parsec/attoparsec we can talk about including in haddock directly.
In the long term having different haddock versions with different features isn't much of a help either.
Alec Theriault notifications@github.com schrieb am Do., 14. Dez. 2017, 15:44:
@alexbiehl https://github.com/alexbiehl I agree with the per-module OPTIONS_HADDOCK part. My question was about the possibility of adding a conditional dependency to Haddock. Something like this in the cabal files:
...
flag reSTParser description: Adds a
reST
option which can be enabled on a per-module basis to write docs in the reStructuredText format. default: False...
build-depends: {- the existing dependencies - very limited by GHC -}
if flag(reSTParser) build-depends: {- new dependencies to help with parsing reStructuredText - not compiled by default, and not limited by GHC -}
Writing up new parsers for reStructuredText and Markdown from scratch, just for Haddock, sounds rather difficult. Besides I'm already imagining the mountain of bugs that would be opened to support every last feature of those formats... That's not what Haddock is supposed to focus on!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/haskell/haddock/issues/570#issuecomment-351730281, or mute the thread https://github.com/notifications/unsubscribe-auth/AByiiTN6Og36V0keViUNk4uW8CNuu8rXks5tATQzgaJpZM4LjNkB .
This project could be done in a few steps:
Huge +1 on this issue, sphinx would allow for eventual building on readthedocs and markdown (even better!) will render beautifully into jekyll sites (static on Github pages). I'm not incredibly experienced with haddock or haskell, but if there are other ways I can help I can offer! I'm watching the issue so I'll keep updated.
I would really like this! Is this still on the radar?
Implement Sphinx domain for Haskell (this would be immediately useful for manually writing documentation in Servant/GHC, etc)
I'm working on this here for now if anyone is interested. It's quite hard to think about all that's needed for haskell as the language and all its extensions is quite a large API surface. This is just a sketch following the examples of the raw page for js and python
I come from a Python background, where I've grown to love Sphinx and its associated primary markup format, reStructuredText.
I'd really like to be able to use it to write my Haskell API docs. Haddock is great, but it's markup syntax is idiosyncratic, and I'm always consulting the reference material.
Also, I notice that Haddock generates its own documentation with Sphinx.