pipes-digital / pipes

Repository for Pipes
https://pipes.digital
GNU Affero General Public License v3.0
261 stars 21 forks source link

Get post content from two different HTML elements #35

Closed anewuser closed 6 years ago

anewuser commented 6 years ago

I'm trying to recreate this feed with Pipes, but the contents for each post come from two separate divs, and apparently you can't do that with Pipes, right?

https://feed43.com/learnersdictionary.xml

https://www.pipes.digital/feed/YPOdgaND

The source URL is: http://www.learnersdictionary.com/qa/post/latest (this always redirect to their most recent article).

Each post body should be generated from .question and .answer .

I've tried extracting .question, .answer with a single block, and extracting them separately and then using a combine block, but both resulted in two separate posts.

onli commented 6 years ago

Right, that was not possible so far. I now modified the extract block to support concat(), a xpath function meant to do exactly that. Please see https://www.pipes.digital/pipe/boNyzPOP for an example on how it can be used to combine those .question and .answer. Hope this helps :)

anewuser commented 6 years ago

It works, but there's something else I didn't mention. The HTML tags (<p>,<em>,<ul>) are stripped from the answer code, so paragraphs and lists are lost and everything is put together in a single block. Compare the output:

https://feed43.com/learnersdictionary.xml

https://www.pipes.digital/feed/YPOdgaND

onli commented 6 years ago

You're right. And I'm not able to modify that xpath expression to concatenate the html of those elements.

I did enable raw (inner_)html output for regular xpath expressions. So it was definitely useful to look into this. But the concat just does not work like this, at least so far.

I will have to think about the best solution here. Offer a custom concat for the exctract block that preserves the raw content of an element? Have a separate block that can concatenate strings? Is there an alternative to xpath that could be offered as an alternative block?

anewuser commented 6 years ago

Wouldn't it be possible to use a simple multiple-element CSS selector like .question, .answer in an extract block for this and put them together with their raw code?

Right now this creates two separate feed posts, but is there any case where that's the intended behavior?

onli commented 6 years ago

I think that can be quiet often the intended behaviour. Just imagine that there are multiple types of news on the site, .sport_news and .top_news, and you want to collect all of them. Then you might write such a combined selector and send them to the feed builder.

And I think I could not implement this. If .question, .answer is finding 10 questions and 5 answers, it would not be helpful to have all of them in one single item. I think it is more http://www.learnersdictionary.com/qa/post/latest that is the edge case, because here the concatenation is simple and useful :)

I'm leaning towards "implement a concat_raw in the extract block that takes the given selectors, gets their inner html and finally merges them together". Doesn't it sound like the right solution? But I am also still wondering whether a generic concatenation block could be useful, and how it would look like.

anewuser commented 6 years ago

I see, so that's by design. I've been using Feed43 for a long time, but I think that all feeds I've ever created followed a strict pattern.

I did imagine cases like your example with Pipes users, but thought that people would just extract the different sections into different feed blocks and then recombine them.

As for the new raw blocks, test them with unwanted tags like script to make sure there's no security problem. Maybe people will want to keep iframes and media tags too, but I think scripts should be always removed.

Here's what that feed looks like in the Feed43 editor, in case you're curious:

screenshot

onli commented 6 years ago

Thanks for that image, lead me in an interesting direction. I added a merge items block that should solve this use case, see https://www.pipes.digital/pipe/YPOdK0ND

anewuser commented 6 years ago

Thanks for looking into it. The design of the page you linked to is broken on my 1366x768 screen, though. I can't see the last block or move it. I've tried it on Firefox and Opera:

screenshot

onli commented 6 years ago

If you use a different browser than Firefox you could click on the blue background and drag it around (there is a bug in FF preventing that to work). But that block is just a feed builder block, merge items connects to the content input.

anewuser commented 6 years ago

I know the point was to show me the new feature, sorry. I just wanted to report the resolution bug before I had some time to test it.

You've used xpath in your example, but actually simple .class selectors work too.

Here's the final feed: https://www.pipes.digital/pipe/YPOdgaND

Thanks a lot. 😄