philschmid / clipper.js

HTML to Markdown converter and crawler.
Apache License 2.0
488 stars 33 forks source link

Clipper failing with parse error - Error("Failed to parse article"); #8

Open ArlindNocaj opened 6 months ago

ArlindNocaj commented 6 months ago

I tried the following command:

clipper clip -u https://aws-experience.com/emea/dach-cee/e/566ad/aws-deep-dive-days---generative-ai

It fails with:

 clipper clip -u https://aws-experience.com/emea/dach-cee/e/566ad/aws-deep-dive-days---generative-ai
/Users/arlnocaj/.nvm/versions/node/v18.16.1/lib/node_modules/@philschmid/clipper/dist/clipper.js:59
        throw new Error("Failed to parse article");
              ^

Error: Failed to parse article
    at extract_from_dom (/Users/arlnocaj/.nvm/versions/node/v18.16.1/lib/node_modules/@philschmid/clipper/dist/clipper.js:59:15)
    at extract_from_url (/Users/arlnocaj/.nvm/versions/node/v18.16.1/lib/node_modules/@philschmid/clipper/dist/clipper.js:86:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Command.<anonymous> (/Users/arlnocaj/.nvm/versions/node/v18.16.1/lib/node_modules/@philschmid/clipper/dist/index.js:44:15)

Node.js v18.16.1

It seems that the readability plugin from mozialla is failing on that page. Ideally it should still give the output, with a warning that readability might not be optimal.

TechupBusiness commented 2 months ago

For such cases maybe a fallback mechanism to an LLM (e.g. local ollama) for markdown conversion could be helpful.