metalsmith / excerpts

A Metalsmith plugin to extract an excerpt from HTML files.
MIT License
24 stars 26 forks source link

Reference-style links aren't transformed into anchor tags #5

Closed mattwidmann closed 6 years ago

mattwidmann commented 10 years ago

Given a Markdown file like the following:


---
title: Reference-style link demo

---

This includes some [links][] and other [things that should turn into anchor tags][a].

[links]: http://example.com
[a]: http://example.com

metalsmith-excerpts just grabs the first portion of text up to the first two newline characters and passes that into marked. This is why marked doesn't turn the reference-style links into actual anchor tags -- the references for them aren't passed in with the first paragraph.

Ideally, the way to fix this would be to excerpt from the HTML after converting the entire document. That way, documents that put all of the references at the bottom of the document would still work. After that, you would need to find the first tag (be it <p>, <pre>, <blockquote>, etc.) in the HTML. Using cheerio, you can accomplish that with:

var cheerio = require('cheerio');
var $ = cheerio.load(file.contents.toString());
var firstTag = $('*').first().clone();
$('*').replaceWith(firstTag);
file.excerpt = $.html();

I think you have to use clone() and replaceWith() to get the outerHTML of the tag you're selecting. I might be wrong about this.

ianstormtaylor commented 10 years ago

Mmm yeah. I could be down to have a pattern option that defaults to **.md. And then have a way to pull from .html files instead. If we always grabbed the first <p> tag that would solve https://github.com/segmentio/metalsmith-excerpts/issues/6 too I think.

mattwidmann commented 10 years ago

The problem with pulling the first <p> tag is that Markdown parsers tend to turn the code block referenced in #6 into a straight <pre><code> block, with no paragraph tags around it. That's why I had to get a bit creative with cheerio to select an arbitrary first tag.

ianstormtaylor commented 10 years ago

For my purposes I've never wanted a <code> block to be selected as the first item, so I'd prefer <p>. But I think sooner it might just be better as a separate plugin

mattwidmann commented 10 years ago

Ah, that makes sense. I'm working on a fix for this, by the way.

woodyrew commented 6 years ago

Fixed with #7