Open hodak opened 7 years ago
I have the same issue, how did you solve it?
I fixed it by encoding the string to bytes as unicode after reading this stackoverflow post.
quotations.extract_from(email_message.html.encode("iso-8859-1"), 'text/html')
The output went from
<html><head></head><body><div dir="ltr">Yes, I got your email. <br></div><br></body></html>
to
<html><head></head><body><div dir="ltr">Yes, I got your email. <br></div><br></body></html>
The culprit Â
is now gone.
Hi, I have a problem that talon responds with strange HTML entities in text when using
extract_from_html
.File I used to reproduce it
Here I use Polish
ł
character:quotations.extract_from_html('Napisał(a):\n<blockquote><span>x</span></blockquote>')
and I get response:
these entities map to:
What's even stranger, when I replace
x
withł
inside blockquote, it responds with:where
ł
is, indeed, entity forł
character I would expect, so text would show correctly on website.