rehypejs / rehype-remark

plugin to transform from HTML (rehype) to Markdown (remark)
https://unifiedjs.com
MIT License
82 stars 8 forks source link

search in hast node #3

Closed thisconnect closed 5 years ago

thisconnect commented 5 years ago

Subject of the feature

Util to search for a hast element, ideally a CSS-like selector

Problem

I would like to try convert only the content of a page, starting at <main>, <article> or [role="main"] and ignore the rest

Expected behaviour

I have implemented a very basic search function (kindof getElementsByTagName), but wonder if A) this is the right approach B) if there is already something for HAST (in rehype, unfidied world)

Alternatives

const unified = require('unified')
const createStream = require('unified-stream')
const parse = require('rehype-parse')
const rehype2remark = require('rehype-remark')
const stringify = require('remark-stringify')
// note:
const all = require('hast-util-to-mdast/lib/all')
const wrap = require('hast-util-to-mdast/lib/util/wrap')

const processor = unified()
  .use(parse)
  .use(rehype2remark, { handlers: { body }}) // changes the body
  .use(stringify)

// try to use main or article, else fallback to body 
function body(h, node) {
  return wrap(all(h, search(node, 'main, article') || node))
}

process.stdin.pipe(createStream(processor)).pipe(process.stdout)

// returns first node by tagName
function search(node, tagName) {
  const tagNames = tagName.split(',').map(s => s.trim())
  const que = node.children
  for (const node of que) {
    if (node.type === 'element' && tagNames.includes(node.tagName)) {
      return node
    }
    if (node.children && node.children.length) {
      que.push(...node.children)
    }
  }
  return null
}
thisconnect commented 5 years ago

of corse I find 'hast-util-select' just after writing this issue :)

thisconnect commented 5 years ago

anyways, do you consider replacing the body with a node deeper in the tree consider a good aproach or a hack that might break in the future?

wooorm commented 5 years ago

Hi @thisconnect!

Yes, that utility already exists. I believe that what you want to do is possible with the current ecosystem.

anyways, do you consider replacing the body with a node deeper in the tree consider a good aproach or a hack that might break in the future?

That would not be something to add here. Instead, I think it makes sense to have as its own plugin. That way others can use it in different cases as well, not just when going rehype -> remark.

Something like this (pseudocode):

var select = require('hast-util-select').select

module.exports = thePlugin

function thePlugin(options) {
  var selector = options.selector

  return transform

  function transform(tree) {
    var node = select(selector, tree)

    if (!node) {
      throw new Error('Could not find matching node')
    }

    return node
  }
}

You can read more about unified and plugins in unifiedjs/unified.

thisconnect commented 5 years ago

thank you for the quick reply

That would not be something to add here. Instead, I think it makes sense to have as its own plugin. That way others can use it in different cases as well, not just when going rehype -> remark.

I agree, but I am not 100% sure how a plugin would work that replaces the entry point (in this case body) independently of rehype -> remark (hast/mdast)

function body(h, node) {
  return wrap(all(h, search(node, 'main, article') || node))
}

which requires

const all = require('hast-util-to-mdast/lib/all')
const wrap = require('hast-util-to-mdast/lib/util/wrap')
wooorm commented 5 years ago

🤔 The above rehype plugin does that.