haskell / hackage-server

Hackage-Server: A Haskell Package Repository
http://hackage.haskell.org
Other
414 stars 198 forks source link

Render README.md in Package pages #279

Closed bitemyapp closed 9 years ago

bitemyapp commented 9 years ago

https://www.fpcomplete.com/blog/2014/11/stackage-open-source

dcoutts commented 9 years ago

Yep, we should do this. We handle Changelogs already, though could do better.

conklech commented 9 years ago

This looks like a dupe of #262. I'm new to this codebase, but I took a look. The current Changelog support wouldn't really help us that much, if I understand right. Here are my "working notes" and summary of what would need to be done implementation-wise. My assumptions are that (1) the goal is to include the contents of a README file on the main package page; (2) doing so would be automatic, replacing the .cabal file's description field iff a README is in the tarball, as Stackage seems to do; and (3) this is a good idea. All of those assumptions are still very much up for discussion, as far as I can tell. Setting aside assumption (2) might require upstream changes to Cabal.

The package page is rendered by packagePage in Distribution.Server.Pages.Package, based on the data in PackageRender, defined in Distribution.Server.Packages.Render. PackageRender only contains a boolean HasChangeLog; if True, packagePage just inserts a hyperlink. To actually incorporate the contents of a README file into the rendered page, we'd need to thread those contents through PackageRender. That structure is generated via packageRender in Distribution.Server.Features.RecentPackages.

There's a little bit of infrastructure in place and accessible to packageRender for delving into the tarball. It uses packageChangeLog from the PackageContents feature to determine whether or not there's a changelog, which gives it enough information to actually open up the tarball, but it doesn't do so.

The only infrastructure we seem to have that actually reads the contents of files in the tarball seems to be serveTarEntry and serveTarball in Distribution.Server.Util.ServeTar. These wouldn't be directly applicable.

So it seems we'd need, at the very least, the following work:

  1. Implement, either in PackageContentsFeature or elsewhere, a capacity to read the contents of a file in the tarball.
  2. Implement an analog to findChangeLog from Distribution.Server.Packages.ChangeLog.
  3. Determine what data we want in PackageRender. An unparsed Maybe ByteString? Comments in PackageRender suggest that there's some other refactoring pending here, so I'll leave this point open.
  4. Somewhere in the pipeline between doPackageRender and Distribution.Server.Pages.Package, render the markdown/whatever to HTML. In the other issue, the package cheapskate was suggested; one way or another, we'd hopefully farm this out to some other package.
  5. Determine, design-wise, how to incorporate the rendered README into the rest of the package page, and alter packagePage accordingly. If we followed Stackage, we'd just replace the existing description render. Stackage also uses an expander to deal with long docs, which is consistent with their style but not with ours.
cartazio commented 9 years ago

@conklech great! let us know if/how you'd like help or feedback here or on the IRCs

conklech commented 9 years ago

@cartazio - I'm not planning to dive into implementation until there's some consensus that it's a good idea. As currently sketched out, including a README in the .cabal file would have a significant side-effect that it didn't have before, especially if we choose to override the description field like Stackage does.

Also, I really don't want to over-promise my capacity to get this done, much less maintain it. I'll be preparing for the California bar exam starting, well, tomorrow.

cartazio commented 9 years ago

good luck!

oh, i didnt realize we were suggesting putting readme's markdown in cabal files.... nvm then

hvr commented 9 years ago

There's one important issue nobody has mentioned yet: This would cause a regression in the cabal info output. You'd no longer see a meaningful package description there if the description field was to be externalised to a README w/o adapting cabal-install as well.

dcoutts commented 9 years ago

@conklech great. Thanks for looking at this. I also had a quick look at pandoc yesterday and it looks like it ought to be relatively straightforward.

Note that we could also replace our own haddock parser with the pandoc one.

The current Changelog support wouldn't really help us that much, if I understand right.

Right. That just serves it directly as a separate resource as text. Once we've looked at this README issue, we might apply the same thing to the changelog and serve that within the package page in html format.

My assumptions are that (1) the goal is to include the contents of a README file on the main package page;

Yes.

Ideally, we can present this on the page such that we only show the first part of the description, with some expander/more thing to see the whole lot. This is actually already an issue with a few packages that have very long package descriptions, but it'll be a serious issue with README files which are often a lot longer.

(2) doing so would be automatic, replacing the .cabal file's description field iff a README is in the tarball, as Stackage seems to do; and (3) this is a good idea. All of those assumptions are still very much up for discussion, as far as I can tell. Setting aside assumption (2) might require upstream changes to Cabal.

I'm also not totally sure yet if we want it to always replace it. But that's something we can think about in the meantime and it should be easy to change once we've thought it through together.

Perhaps in the longer term we should extend the .cabal format to have a package-description-file: field as an alternative to the package-description field.

The only infrastructure we seem to have that actually reads the contents of files in the tarball seems to be serveTarEntry and serveTarball in Distribution.Server.Util.ServeTar. These wouldn't be directly applicable.

Right, but the code there is what you want to use. Within that code you'll find code to read the contents of individual members of a tar file. That can be pulled out into a top level function and exported. You can also look at the code that we use to find out if the tarball contains a changelog file as a template for finding the readme (e.g. we allow a few variations on the file name).

So it seems we'd need, at the very least, the following work: Implement, either in PackageContentsFeature or elsewhere, a capacity to read the contents of a file in the tarball.

Yep.

Implement an analog to findChangeLog from Distribution.Server.Packages.ChangeLog.

Yep.

Determine what data we want in PackageRender. An unparsed Maybe ByteString? Comments in PackageRender suggest that there's some other refactoring pending here, so I'll leave this point open.

Seems reasonable.

Somewhere in the pipeline between doPackageRender and Distribution.Server.Pages.Package, render the markdown/whatever to HTML. In the other issue, the package cheapskate was suggested; one way or another, we'd hopefully farm this out to some other package.

Personally I would suggest we use pandoc. The reason is that we can also use pandoc to replace the haddock markup parser that we already use (for the existing package description field).

There's some details to make sure we read markdown without arbitrary html, but I think thats possible with the pandoc reader options (based on my quick look at the pandoc API yesterday).

Determine, design-wise, how to incorporate the rendered README into the rest of the package page, and alter packagePage accordingly. If we followed Stackage, we'd just replace the existing description render. Stackage also uses an expander to deal with long docs, which is consistent with their style but not with ours.

Right, as I mention above something to deal with very long descriptions will be necessary, or it'll push everything else down the page.

dcoutts commented 9 years ago

@hvr Yes, I think a package-description-file: is the right thing in the medium term.

One related issue, perhaps this is a good time to start to think about multi-page markdown documents...

conklech commented 9 years ago

Could someone comment on the distinction between doPackageRender and the packageRender field in RecentPackages? At the moment, doPackageRender is basically a pure function wrapped in IO. RecentPackages and PackageCandidates have some duplicate code for doing the relevant IO work: getting the users database and checking for the changelog. The Html feature uses packageRender from RecentPackages to render the main package page.

I propose to factor packageRender out of RecentPackages, incorporate its effects into doPackageRender, and remove the thus-redundant parts of PackageCandidates. I'd then, regarding the present issue, have doPackageRender perform both the file extraction and, eventually, parsing.

Am I missing something?

conklech commented 9 years ago

I think I understand now how the various Features relate to each other, and why doPackageRender doesn't do any IO--it doesn't have access to most of the IO apparatus tied up in Features. The functions to extract files from the tarball should be in a Feature that has access to TarIndexCache and is loaded by both PackageContents and PackageCandidates. There's no such Feature at the moment other than TarIndexCache itself.

conklech commented 9 years ago

Our current aeson==0.6.1.* version pin restricts us to using an ancient version of Pandoc. So this issue more or less depends on #73 and #272.

(We can theoretically compile with pandoc-1.6.0.1, but I'm having a hard time building it in my dev sandbox because of some problem either with the sandbox or with the old version of texmath it uses.)

cartazio commented 9 years ago

why are we considering using pandoc? ... if nothing else, that complicates the license for hackage-server.

Why not cheapskate?

conklech commented 9 years ago

@cartazio - See @dcoutts's long comment yesterday. It would allow us to drop our Haddock parser. I'd add that it's a mature package.

conklech commented 9 years ago

cheapskate also imposes a spurious build dependency on wai (and our friend aeson) because of one of its optional bundled executables. It should be fixable upstream.

cartazio commented 9 years ago

umm, doesnt haddock the library expose its parser now? http://hackage.haskell.org/package/haddock-library-1.1.1/docs/Documentation-Haddock-Parser.html

i think thats a spurious motivation to add pandoc since haddock is now properly librariefied

cartazio commented 9 years ago

I think @jgm (author of both) would agree that cheapskate is a lighter dep :)

conklech commented 9 years ago

By the way, I'm not disagreeing with @cartazio. It doesn't particularly affect me or this issue; both Pandoc and cheapskate provide, in essence, a pure function ByteString -> Html. I suppose the one material difference may be in HTML sanitization. Again, see @dcoutts's 12/22 comment.

conklech commented 9 years ago

I've implemented the basic plumbing -- my points 1-3 from above. See my comments on pull #301. In particular, I haven't even tried to test it.

mpickering commented 9 years ago

The pandoc haddock parser is just a thin wrapper around haddock-library.

As for pandoc vs cheapskate - the cheapskate parser is much more performant than the pandoc parser. Neither at the moment conforms to CommonMark. Pandoc markdown has many extensions which may or may not be desirable, cheapskate is bare-bones.

I've just sent a message to John asking what his plans are regarding the two and will report back what he says.

dcoutts commented 9 years ago

Ok, fair enough. If people think cheapskate really is better, and we can use the haddock lib ok for the other part then fine. There's some advantage to using just one lib for both (sharing common code) but it's not that big a deal.

dcoutts commented 9 years ago

To be clear: I'm not arguing for one or the other, I don't know either well enough. But given that we can use the haddock lib for the haddock stuff, then we can decide between cheapskate and pandoc on their merits.

bgamari commented 9 years ago

@mpickering any news on this?

mpickering commented 9 years ago
I've been spending a lot of time getting libcmark right.
When the API stabilizes, I (or someone else) can write a Haskell
binding for it, and that could be used in a pandoc reader.

Alternatively, we could try for a pure Haskell implementation,
that copied the parsing strategy.  (Cheapskate is in the vicinity,
because I used lots of ideas from cheapskate, but it would need
substantial revisions.)  That would give more safety guarantees,
and maybe be more portable, at the expense of speed.

(Currently libcmark is more that 80 times faster than pandoc,
and it is much more robust -- pandoc's Markdown parser can be gummed
up by some pathological inputs.)

Best
John
snoyberg commented 9 years ago

Is there any progress on this issue? Having to duplicate descriptions across multiple places in different formats continues to be a real maintenance burden.

conklech commented 9 years ago

Not from my end, sorry; I'm out of commission until after the bar exam in July. IIRC the branch I was working on "just needs" pandoc/cheapskate/etc. dropped in, the HTML output spliced into the package page, and of course testing.

On Tue, Feb 24, 2015 at 7:32 AM, Michael Snoyman notifications@github.com wrote:

Is there any progress on this issue? Having to duplicate descriptions across multiple places in different formats continues to be a real maintenance burden.

— Reply to this email directly or view it on GitHub https://github.com/haskell/hackage-server/issues/279#issuecomment-75778251 .

snoyberg commented 9 years ago

Looks like this got picked up as a PR in #333.

dcoutts commented 9 years ago

The remainder of this, to put the readme inline is now done, pending deployment.

cdupont commented 8 years ago

Hello, displaying README in Hackage works great for my package. However, is it possible to display pictures, with a code like below? It doesn't seem to work. The picture is included in the cabal sdist.

![pic](img/mypic.png)