silverbucket / dogfeed

an unhosted feed reader
https://dogfeed.5apps.com
GNU Affero General Public License v3.0
34 stars 4 forks source link

send feed UTF8 encoded to the client #12

Closed kradan closed 10 years ago

kradan commented 10 years ago

i tested the dogfeed with https://freie-radios.net/portal/podcast.php the umlauts look strange as the feed is encoded iso-8859-1. It would look nicer with utf8 :)

silverbucket commented 10 years ago

Hi @kradan, thanks for pointing this out. I did some digging, but haven't been able to figure out how to encode the articles I fetch as UTF-8. I tried using iconv, (from iso-8859-1 to utf-8) but the result was a bunch of numbers instead of text.

Any pointers as to where to look to do this?

silverbucket commented 10 years ago

I got this resolved in sockethub. will be in the next release. closing here.

kradan commented 10 years ago

cool, thanks!

hugoroy commented 10 years ago

Hey, I don’t know if it’s the same thing, but it seems that encoding is still wrong here

capture du 2013-12-02 08 30 30

feed: https://hroy.eu/index.atom

silverbucket commented 10 years ago

So, the problem is: If I fix issues like @kradan reports (feeds encoded as iso-8859-1 looking funny) then feeds encoded as utf-8 look funny.

If I remove the conversion, then the utf-8 feeds like fine, and the iso-8859-1 feed looks screwy again.

I don't know how to detect the encoding of a stream and dynamically insert the iconv conversion if needed. If either of you have any pointers or suggestions, that would help a lot, otherwise I have got to do more digging, but may revert the conversion for now since I think feeds encoded with utf-8 should look nice by default as a priority.

silverbucket commented 10 years ago

meant to ping @hugoroy as well ^^

silverbucket commented 10 years ago

It looks like the original feed in this issue is incorrectly encoded and thats the crux of the issue. It's reported as iso-8859-1 but contains non-ascii characters.

So, for now I'm going to remove the conversion so that a majority of the feed sources work correctly. I may consider options later on to specify unique encoding conversions for streams, but that seems like enough of an edge case to ignore for now. So some feeds that aren't represented right, won't look perfect.

silverbucket commented 10 years ago

commit https://github.com/sockethub/sockethub/commit/3544d34b9721122c00ba3761f232e020fb72b96a