node-fetch / fetch-charset-detection

Charset detection and conversion, originally from node-fetch.
MIT License
11 stars 3 forks source link

wrong metadata charset tags #301

Open yuval-herman opened 1 year ago

yuval-herman commented 1 year ago

Firstly thank you for working on this library! It's a big help!

I was using your library to scrap some old sites with node fetch and came across a strange issue. While scraping this site specifically, I got this error out of iconv:

Error: Encoding not recognized: 'visual' (searched as: 'visual')

This was caused by this tag in one the subframes(which I also scrap) in the page:

<meta http-equiv="Content-Type" content="text/html; charset=visual">

A solution would be to check if the meta content tag hold garbage data before committing to it.

Sample code to reproduce:

import fetch from "node-fetch"
import convertBody from "fetch-charset-detection"

fetch(
    "https://www.gov.il/apps/elections/Elections-knesset-15/heb/banner.html"
).then((res) =>
    res
        .arrayBuffer()
        .then((buf) => convertBody(buf))
        .then(console.log)
)