time-less-ness / trust-assembly

For SomeGuy's Trust Assembly Project
5 stars 6 forks source link

Add article extractor library #11

Open adhurjaty opened 2 weeks ago

adhurjaty commented 2 weeks ago

Adds the @extractus/article-extractor library to facilitate article parsing. Functionality remains the same, but we now generate an object that is easier to work with. Example output:

{
  "url": "https://stackoverflow.com/questions/8644428/how-to-highlight-text-using-javascript",
  "title": "How to highlight text using javascript",
  "description": "Can someone help me with a javascript function that can highlight text on a web page.\nAnd the requirement is to - highlight only once, not like highlight all occurrences of the text as we do in cas...",
  "links": [
    "https://stackoverflow.com/questions/8644428/how-to-highlight-text-using-javascript"
  ],
  "image": "https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon@2.png?v=73d79a89bded",
  "content": "<div>\n<p>The solutions offered here are quite bad.</p>\n<ol>\n<li>...",
  "author": "",
  "favicon": "https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196",
  "source": "stackoverflow.com",
  "published": "",
  "ttr": 199,
  "type": "website"
}

Try it here

Webpack changes:

We now get a warning on build that we are exceeding the recommended entrypoint asset size. Something maybe worth addressing down the line.