pgxn / pgxn-api

Maintain and serve a REST API to search PGXN mirrors
http://pgxn.org/
15 stars 2 forks source link

Teach Parser to resolve doc links to the .html files they generate #38

Open GeoffMontee opened 8 years ago

GeoffMontee commented 8 years ago

The tds_fdw documentation has a primary README.md file, and that file contains relative links to other .md files. e.g. something like this:

## Installing on CentOS

See [installing tds_fdw on CentOS](InstallCentOS.md).

This works great on GitHub, but the links don't work on PGXN. Instead, you get a 404 error.

It looks like PGXN is translating these pages to html. For example, InstallCentOS.md can be found here.

Since PGXN is converting these .md pages to .html, would it be possible for PGXN to fix the links as well?

This is related to this tds_fdw GitHub issue:

https://github.com/GeoffMontee/tds_fdw/issues/44

theory commented 8 years ago

Hrm. GitHub is a bit of a different beast. It links to the file in the same directory, but you'll note that it resolves to ./blob/master/InstallCentOS.md. But that's because there is a .md file to link to. The thing is, though, that using a .md file in a link doesn't really make much sense in the context of Markdown and its HTML output. You will find that the link won't work on any site that generates HTML from Markdown, unless the server is set up to server .md files as HTML. Which PGXN is not and likely never will be.

Also, PGXN doesn't do any parsing of documentation files. It outsources that to a Markdown parser. And it supports a lot of formats.

If you changed the link to InstallCentOS.html, it would of course work. But then maybe it would no longer work on GitHub. I'm hard-pressed to come up with a good solution here. :-(

Do note that the home page for your release on PGXN already links to that document directly.

GeoffMontee commented 8 years ago

Thanks for the quick reply, @theory.

I looked over the PGXN documentation, and I see now that the site doesn't create the HTML. It relies on Text::Markup to do that.

I'll have to think about the best way to solve this.

I see that Text::Markup can accept HTML as input too. If a distribution uploaded to PGXN contains both README.md and README.html, will it still convert README.md to HTML, or would PGXN use the HTML file in the distribution instead of the .md file?

theory commented 8 years ago

Hrm. I don't know. The indexing code just grabs the first README it finds. I've no idea whether it would consistently find one or the other. Might be nice to update that code to have an order of preference, I guess. I myself have CPAN modules with both a README and a README.md.

theory commented 4 months ago

Moving to the pgxn-api issues, where all the HTML parsing and formatting happens. We could perhaps read _clean_html_body to look at anchor href values and, if they point to a file that the parser has converted to HTML, update the link to point to the .html file, instead.