"Confident" means "metadata of the document explicitly indicates that the encoding is UTF-8".
Background of the patch
When a UTF-8 feed has a few invalid characters but the rest is fine, feedparser will only parse it as iso-8859-2 (or other encodings detected by chardet, if installed), even if both the HTTP and XML headers explicitly indicate that its encoding is utf-8.
To handle it better, we should decode the feed as UTF-8 with errors='replace'.
"Confident" means "metadata of the document explicitly indicates that the encoding is UTF-8".
Background of the patch
When a UTF-8 feed has a few invalid characters but the rest is fine, feedparser will only parse it as
iso-8859-2
(or other encodings detected bychardet
, if installed), even if both the HTTP and XML headers explicitly indicate that its encoding isutf-8
.To handle it better, we should decode the feed as UTF-8 with
errors='replace'
.