rust-syndication / rss

Library for serializing the RSS web content syndication format
https://crates.io/crates/rss
Apache License 2.0
426 stars 52 forks source link

Wrong encoding #87

Closed ayrat555 closed 4 years ago

ayrat555 commented 4 years ago

rss uses wrong encoding for non-utf8 text For example https://pikabu.ru/xmlfeeds.php?cmd=popular

andy128k commented 4 years ago

@ayrat555 Can you provide more details. I tried to reproduce it but it works for me. Please have a look at #90.

ayrat555 commented 4 years ago

@andy128k Thank you for looking into the issue.
I'm not sure but I think when you saved the feed to file you fixed its encoding. Can you try using from_url feature?

andy128k commented 4 years ago

@ayrat555 Actually file is in cp1251. It is github who converts it. If you open a file and click "Raw", you will see it is still cp1251.

I am going to remove from_url feature soon. (see #88).

ayrat555 commented 4 years ago

ok. I'll look into it again on the weekend. Currently, I use rss for https://github.com/ayrat555/el_monitorro . And it doesn't handle the encoding well. Initially, I used from_url feature, but got rid of it, and result is the same. I get symbols like Заначка

the values come directly from rss, I don't do any pre-processing

andy128k commented 4 years ago

@ayrat555 Looks like double decoding happens here.

  1. First decoding is here and it is based on response header.
  2. Second one happens in quick-xml (it is used by rss).

Try to change your read_url function to return Vec<u8> instead of String.

ayrat555 commented 4 years ago

@andy128k thank you. it was exactly that. https://github.com/ayrat555/el_monitorro/commit/87e1f5bbead4627aedf9434a4f74536e2401024b