Closed Gonzih closed 5 years ago
I have the same issue with another feed, http://www.hirek.sk/rss/hirek.xml. What do you suggest, how to put the source into the proper encoding? Does feedparser-clj has a support for that or should an other lib be used for preprocessing?
no, feedparser-clj does not do anything. I can try to fetch data on my own and maybe try to detect encoding and then convert it to unicode. I'm still surprised that there are websites not using unicode. I will take a look at that once I have some spare time. Thanks for providing another example.
On 01/31/2016 08:57 PM, György Frivolt wrote:
I have the same issue with another feed, http://www.hirek.sk/rss/hirek.xml. What do you suggest, how to put the source into the proper encoding? Does feedparser-clj has a support for that or should an other lib used for preprocessing?
— Reply to this email directly or view it on GitHub https://github.com/scsibug/feedparser-clj/issues/9#issuecomment-177597400.
I'm also surprised, but that's reality. I checked the referred java libraries and it seems only few encodings, ascii, utf-8, utf-16,... are supported.
Maybe it's not feedparser-clj's job to do the conversion. Maybe a recommendation about how to pre-process the feeds is sufficient. Probably for most of the feedparser-clj user unicode is enough.
What would you recommend, what to use to do the conversion?
I would say java interop can do the trick.
which java to interop with? :) which library, do you know some resource/document where the encoding conversion is documented?
http://stackoverflow.com/questions/5729806/encode-string-to-utf-8
On 02/02/2016 04:24 PM, György Frivolt wrote:
which java to interop with? :) which library, do you know some resource/document where the encoding conversion is documented?
— Reply to this email directly or view it on GitHub https://github.com/scsibug/feedparser-clj/issues/9#issuecomment-178631507.
Hi, I'm using your amazing lib in my feeds2imap.clj project. Recently one person reported issue with this feed http://ibash.org.ru/rss.xml. Looks like feed is using windows-1251 encoding (which is horrible), but still. Is there any way to make parser respect encoding specified in xml and convert everything to unicode?
Thanks!