pipes-digital / pipes

Repository for Pipes
https://pipes.digital
GNU Affero General Public License v3.0
254 stars 21 forks source link

Pipes fails to load feeds from forums powered by Simple Machines #78

Closed anewuser closed 2 years ago

anewuser commented 2 years ago

Pipes downloads the HTML front pages of these forums instead of their feeds:

https://neo-source.com/index.php?type=rss;action=.xml https://forum.nasaspaceflight.com/index.php?type=rss;action=.xml

These URLs work fine with other RSS services, though.

onli commented 2 years ago

Hey, thanks for the report. I can confirm the bug, maybe the parsing of the HTML fails. This needs a closer look.

onli commented 2 years ago

Okay, time to share a bit what's going on here:

The adressable/uri gem is taking https://neo-source.com/index.php?type=rss;action=.xml and transforming it to https://forum.nasaspaceflight.com/index.php?type=rss%3Baction=.xml - note the ; becoming a %3B. https://en.wikipedia.org/wiki/Percent-encoding#Percent_character makes me think that's correct, nonetheless the forums you linked want the ; and otherwise return the default html.

I'm not sure whether it will cause additional issues, but I will now see that ; does not get transformed (or transform it back manually) so that these sites start to work.

onli commented 2 years ago

Yes, the change applied without problems I could notice. Please do tell me if that was the wrong judgement, but I hope this works :)

anewuser commented 2 years ago

Thank you for looking into it. The feed I was trying to add now works.

A plain ; can/could also be used as a valid URL parameter delimiter exactly to avoid the problem of having to encode ampersands (&). The Simple Machines system uses it everywhere in profile links and other forum pages.

From an old guideline:

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters

I was going to say that this is an old-school use, but it turns out it is surprisingly a somewhat controversial topic. Some people even say that it is now "illegal", but others argue that ampersands are the actual archaic separators and semicolons make more sense: https://stackoverflow.com/a/40768572 (see the number of comments debating it).

onli commented 2 years ago

Honestly news to me, but interesting! I would also have seen it as an old-school, nonstandard usage. Thanks for the links.