pipes-digital / pipes

Repository for Pipes
https://pipes.digital
GNU Affero General Public License v3.0
264 stars 21 forks source link

Preserve Youtube video descriptions #64

Closed anewuser closed 4 years ago

anewuser commented 4 years ago

Combine blocks drop video descriptions from Youtube feeds.

Also, custom block titles cannot be edited if they are too long.

Example: https://www.pipes.digital/pipe/0OoZyGNK

onli commented 4 years ago

Thanks for the clear example!

onli commented 4 years ago

We can now parse that description, thanks to the new changes in the feedparser gem. But adding it to the output feed is not easy as long as https://github.com/ruby/rss/issues/18 is open, as long as ruby/rss does not support Media RSS. But maybe that happens?

If not we'd have to change how we approach the feed generation, probably do it manually in erb files.

Workaround for youtube: If a feed item has no description so far take the media description for that. Does that work for you?

anewuser commented 4 years ago

This update seems to have caused some problems:

If a feed item has no description so far take the media description for that. Does that work for you?

That's good for me, but the generated posts need line breaks.

onli commented 4 years ago

I think I found the issue for the 500 errors. Thank you. The pipe seem to load now as well reliably, right?

I don't think that how ghacks is treated changed. The block preview being incomplete is an issue with the cdata used in that feed, we need a better javascript library there. I notice that the description is at the top of the content, that hadn't been like that before?

That's good for me, but the generated posts need line breaks.

That's good. I added a nl2br to them.

anewuser commented 4 years ago

Yes, I'm not getting errors anymore.

I don't think that how ghacks is treated changed.

I see. I had been using the previews for the filtering blocks then.

I notice that the description is at the top of the content, that hadn't been like that before?

~Now I'm not sure about this... It turns out that my feed reader doesn't display either the superfluous Ghacks descriptions or the wanted Youtube descriptions. Maybe the line-break tags in the Youtube feeds need to be escaped?~

onli commented 4 years ago

Sorry for the delay.

The first feed looks good to me now. Maybe that was a hickup, maybe a specific article configuration not visible right now.

The second should now be the "fault" of the feedreader. Support for the media rss standard has always been spotty. But with the changes pipes gives you the tools to take the media description and insert it as the regular summary/content, which the feedreader would pick up then. That's the solution I suggest.

anewuser commented 4 years ago

The problem is the markup. Inoreader works with standard Youtube feeds, but the blocks are changing <media:description> into just <description>: https://www.pipes.digital/pipe/xNBLmrqX .

If they started showing the <description> tag as part of item contents, people would see a lot of duplicated text in regular feeds.

If you want to keep it like that, is there an easy way to add media: back to those tags with a regex?

onli commented 4 years ago

But description is the main content tag of a feed item. It's completely wrong to ignore it. Especially when a feed does not even have a content:encoded field in the item, as here. So that's really strange (and the feed does work in feedly).

It's sadly right now not possible to use the insert block to insert the description block as media:description, the : causes an error. I will have a look whether that's fixable.

anewuser commented 4 years ago

I just wanted a filter in another pipe to include video descriptions, and the way everything is now works for me. I was thinking more of other users who might expect to see the descriptions in their readers too.

I've contacted Inoreader about this case.

anewuser commented 4 years ago

They've sent me this reply:

It looks like the feed is having too many structural issues which are the most possible reason for that. We'll check what could be done about that and will let you know when we have a solution. https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.pipes.digital%2Ffeed%2FxNBLmrqX

onli commented 4 years ago

Thanks for that! :)

Just in case that they ask for more information or you can point them to here: I don't think that validation makes sense, and I don't see no structural issues.

The only serious error would be Invalid HTML: Named entity expected. Got none., which incidentally points to description, but there is no reason why description would cause that error. description is not none, and the html inside it is escaped. https://www.feedvalidator.org/check.cgi?url=https%3A%2F%2Fwww.pipes.digital%2Ffeed%2FxNBLmrqX also does not show such an error (and it is the better validator in my experience).

If their internal debugger can give more information that would be nice, but given that other readers can display that feed just fine I think that reply is just wrong.

But of course if there is something I can fix in Pipes to help with compatibility I'd make that change.

anewuser commented 4 years ago

Their reply:

The problem is the guids in the feed's items are the same as youtube's. Those guids are handled specially in Inoreader, because our parser wasn't expecting them to be found in other feeds. The special handing is now reserved only for feeds from youtube.com domain, so new articles will not have this special handing and should display the content 1:1.

Now Inoreader shows the descriptions, but won't parse the items automatically anymore to embed the videos...

This is the solution I've come up with to make Pipes insert video iframes to each item: https://www.pipes.digital/editor/m9e5YP9o . Please take a look at it and tell me if there's any less resource-intensive way to achieve this. I tried using an Insert block first, but it only added the same video to all items: https://www.pipes.digital/editor/DNJG4JOa

I've also moved most of my Youtube subscriptions to a third-party parser to stop abusing your server. https://www.pipes.digital/pipe/LOMGy59r kept giving timeouts with 25 Youtube feeds, and I always had to refresh it twice on Inoreader to force it to update.

onli commented 4 years ago

Clever! The replace approach should work. It works for me in https://www.pipes.digital/editor/l9ve4lq1 and is not all that heavy. I doubt that there is a better way to do this right now.

I've also moved most of my Youtube subscriptions to a third-party parser to stop abusing your server.

Thank you! It should not be necessary as Pipes tries to avoid spamming the target site, but if their request limit is really low this can happen. Maybe Pipes has even more defensive when accessing youtube urls.

anewuser commented 4 years ago

@onli Thank you for the new Reddit option.

Last month I went down the rabbit hole with these Youtube feeds and tried many different things, but now parsed Youtube pipes are working on Inoreader again as they did before July. I don't get the video descriptions, but that's better than having to insert iframes.

One nice outcome of all this is that now I'm using http://www.rssmix.com/ to merge my longer lists of feeds before filtering them. It's been working perfectly with Pipes. Maybe you'll find all these filters and manipulations I've created over time interesting:

On an unrelated note, take a look at this too. It has a nice XPath element picker: https://morss.it/ / https://github.com/pictuga/morss

onli commented 4 years ago

Hi @anewuser, great that it works now :) And thanks for the link to rssmix. Combining services like that is exactly what Pipes is supposed to do well!

I could not find the xpath element picker on the morss page. Could you point me to it? For me it seems to work completely automatic, which is pretty cool.

anewuser commented 4 years ago

I could not find the xpath element picker on the morss page.

You just have to enter the URL of a site that has no feeds:

example