kickscondor / fraidycat

Follow blogs, wikis, YouTube channels, as well as accounts on Twitter, Instagram, etc. from a single page.
Other
1.74k stars 55 forks source link

rss reader does not respect encoding assuming everything is utf-8 #105

Open treeshateorcs opened 4 years ago

treeshateorcs commented 4 years ago

Desktop (please complete the following information):

Describe the bug this is kind of a feature request and bug report at the same time

kickscondor commented 4 years ago

I've recently rewritten the RSS parser, so this was bound to happen. I'll sort this out. Thank you!

kickscondor commented 4 years ago

Just as a note to myself, the way to load non-UTF-8 into DOMParser is like so:

let res = await fetch("http://www.opennet.ru/opennews/opennews_all.rss",
  {credentials: 'omit'})
let dec = new TextDecoder('koi8-r')
let raw = await res.arrayBuffer()
let body = dec.decode(raw)
let doc = dom.parseFromString(body, 'text/xml')

Unfortunately, doc.imputEncoding, doc.charset and doc.characterSet all report back "UTF-8". I will probably have to manually parse the beginning of the XML to get the encoding in this case. (Or refer to the Content-Type header if that fails.)