nkanaev / yarr

yet another rss reader
MIT License
3.02k stars 225 forks source link

Potential features in a branch #68

Closed fserb closed 3 years ago

fserb commented 3 years ago

Hi, I understand that #57 means the app is feature complete from your perspective.

That said, I added some features in my branch and I thought it would be nice to at least point them here, in case there's something that interests you. I wouldn't mind making a PR for any of them:

If anything makes sense to you here, please let me know.

If there's no interest in any of them, it's all good. :)

Thanks for the great project.

nkanaev commented 3 years ago

Thanks for reaching out. I might consider making certain buttons available across all filters, but haven't decided on that yet.

Is there any reason you're relying solely on the feeds' charset?

fserb commented 3 years ago

Well, I have some feeds in Portuguese that use ISO-8859-1. For example this one. The situation there is:

What was happening is that the initial HTTP conversion would convert from the default UTF-8, instead of leaving the data unchanged because it has no charset. By the time it got to the feed, if I tried to do the conversion, the data was already in a state where the charset conversion wouldn't work and I'd get gibberish.

The correct (spec-wise) way to do this would be: pass the charset from Content-Type forward as a default that can be overwritten by the feed itself and always only do one conversion. Even if there's a charset on Content-Type, it should be overridable by the document one and we should not do two conversions.

This was a bit too complicated given the current code, and since all of my feeds are of this type (I didn't have a single feed that passed charset on Content-Type), I just moved the conversion there.

nkanaev commented 3 years ago

Ah, I recall seeing those for certain feeds. Thanks for the report!

The solution: https://github.com/nkanaev/yarr/commit/982c4ebbbc284e407d7b3e0c67ebe3780db31382, albeit being hacky, should resolve the given issue.

fserb commented 3 years ago

This helps in my case, but it's still technically incorrect, right?

If there's a Content-Type with charset UTF-8 and a feed with encoding for some other charset, the UTF-8 should be ignored, not applied first.

But yeah, it does help to have both side by side at least.

nkanaev commented 3 years ago

This helps in my case, but it's still technically incorrect, right?

Yes, that's true.

I've long given up hope on being technically correct due to numerous http/feed related issues I've encountered. Most of them have already been described in https://inessential.com/2013/03/18/brians_stupid_feed_tricks.