microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

metascraper-logo-favicon should prioritize png over ico and allow URL input #724

Closed aldenquimby closed 1 month ago

aldenquimby commented 1 month ago

Prerequisites

Subject of the issue

Two related issues:

  1. metascraper-logo-favicon should prioritize png over ico. basically this code should be running: https://github.com/microlinkhq/metascraper/blob/cfe41bb4ef30ce7d4e6dee418270057706db937c/packages/metascraper-logo-favicon/src/index.js#L54-L61

  2. metascraper-logo-favicon should not fail with "url.match is not a function" when URL is supplied instead of string

    • metascraper, metascraper-title, metascraper-image, and metascraper-description all work with URL or string as input

Steps to reproduce

const url = new URL('https://github.com'); const html = await fetch(url).then(r => r.text());

const metascraper = createMetascraper([metascraperLogoFavicon()]); const result = await metascraper({ html, url });

> **Note**: You can reproduce the code using [interactive Node.js shell by Runkit](https://npm.runkit.com/metascraper).

### Expected behaviour
- metascraper returns the favicon

### Actual behaviour
- metascraper throws this error:

TypeError: url.match is not a function at getSize (webpack-internal:///(rsc)/../../node_modules/metascraper-logo-favicon/src/index.js:66:20) at eval (webpack-internal:///(rsc)/../../node_modules/metascraper-logo-favicon/src/index.js:82:17) at arrayReduce (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:698:21) at Function.reduce (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:9750:14) at eval (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:4431:28) at arrayReduce (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:698:21) at baseWrapperValue (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:4430:14) at LodashWrapper.wrapperValue (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:9115:14) at getDomNodeSizes (webpack-internal:///(rsc)/../../node_modules/metascraper-logo-favicon/src/index.js:86:6) at eval (webpack-internal:///(rsc)/../../node_modules/metascraper-logo-favicon/src/index.js:92:26) at arrayReduce (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:698:21) at Function.reduce (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:9750:14) at eval (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:4431:28) at arrayReduce (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:698:21) at baseWrapperValue (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:4430:14) at LodashWrapper.wrapperValue (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:9115:14) at getSizes (webpack-internal:///(rsc)/../../node_modules/metascraper-logo-favicon/src/index.js:94:6) at eval (webpack-internal:///(rsc)/../../node_modules/metascraper-logo-favicon/src/index.js:244:23) at eval (webpack-internal:///(rsc)/../../node_modules/@metascraper/helpers/index.js:444:29) at findRule (webpack-internal:///(rsc)/../../node_modules/@metascraper/helpers/index.js:434:35) at eval (webpack-internal:///(rsc)/../../node_modules/metascraper/src/get-data.js:11:27) at arrayMap (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:654:23) at map (webpack-internal:///(rsc)/../../node_modules/lodash/lodash.js:9623:14) at getData (webpack-internal:///(rsc)/../../node_modules/metascraper/src/get-data.js:10:5) at eval (webpack-internal:///(rsc)/../../node_modules/metascraper/src/index.js:29:12)



## Tech Notes
- the problem appears to be here: https://github.com/microlinkhq/metascraper/blob/7a504f74b6b2aa308ecf91c849cd8b82f852f3c2/packages/metascraper-logo-favicon/src/index.js#L81
- `getSize` is expecting `url` to be a string, but it's a `URL` sometimes
- it looks to me like `normalizedUrl` should be passed in to `getSize`, and not `url`?