mozilla / readability

A standalone version of the readability lib
Other
8.34k stars 579 forks source link

Conflict with package versions #873

Open vankov1 opened 1 month ago

vankov1 commented 1 month ago

Hey everyone, I tried to fork the repo and use it as a dependency and I noticed strange behavior. Unless the version of nwsapi is fixed between 2.2.2 and 2.2.9, it is breaking jsdom/readability. If nwsapi is skipped, it gets updated to 2.2.10, which causes a lot of paragraphs to disappear.

Here is a basic package.json with working example. If you change the version of nwsapi or remove it, it will start to break.

{
  "name": "extr-debug-1",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "start": "node index.js"
  },
  "author": "",
  "license": "ISC",
  "description": "",
  "dependencies": {
    "@mozilla/readability": "github:mozilla/readability#main",
    "jsdom": "20.0.2",
    "nwsapi": "2.2.9"
  },
  "engines": {
    "node": ">=14.0.0"
  }
}

Here is a page that is being parsed properly with the 0.5.0 version: https://www.crownpeak.com/policies/privacy-policy/

The "Contacting Us" section and other parts of the document disappear.

cmkm commented 2 weeks ago

Not completely sure, but this looks like it's probably an issue with jsdom rather than Readability here. Is there any chance that the input you're providing to Readability is significantly different based on the package versions?