Can pipes take inputs? And am I doing `extract` correctly?

jean commented 7 years ago

While working on filtering feeds I noticed a few things:

A pipe can't take input?

I can use a pipe as an input for another pipe, but a pipe can't take input itself. I.e. I'd like to create:

->[filter]->[duplicate]->[extract,extract,extract]->[build]->[out] and name it "filter merges", and then create:
- [feed1]->[filter merges]->[out],
- [feed2]->[filter merges]->[out],
- [feed3]->[filter merges]->[out],

instead of

[feed1]->[filter]->[duplicate]->[extract,extract,extract]->[build]->[out],
[feed2]->[filter]->[duplicate]->[extract,extract,extract]->[build]->[out],
[feed3]->[filter]->[duplicate]->[extract,extract,extract]->[build]->[out].

How do I extract links?

Extracting //item/title from the filter output works fine, but extracting //item/link doesn't extract any links.

The output of the [filter] looks like:

<item>
<title>
Merge pull request #713 from REPO/BRANCHNAME
  </title>
<link>
https://github.com/ORG/REPO/commit/ASDFSADFSADF
  </link>
[...]

but the output of the [extract //item/link] block looks like:

<item>
<title>
Extracted Content
  </title>
<pubDate>
Wed, 05 Jul 2017 11:01:49 +0000
  </pubDate>
<guid>
GUID
  </guid>
<content:encoded />
<dc:date>
2017-07-05T11:01:49.488103+00:00
  </dc:date>

  </item>

The output of the [extract //item/title] block is fine:

<item>
<title>
Extracted Content
  </title>
<pubDate>
Wed, 05 Jul 2017 11:05:54 +0000
  </pubDate>
<guid>
GUID
  </guid>
<content:encoded>
Merge pull request #713 from REPO/BRANCHNAME
  </content:encoded>
<dc:date>
2017-07-05T11:05:54.088334+00:00
  </dc:date>

  </item>

Finally, I had to duplicate the [extract //item/title] block, because the [build feed] block requires content, so I had one [extract //item/title] going to title, and another going to content. That seemed a bit unwieldy. I have [extract //item/link] going to link, but the output feed has only empty <link /> elements.

onli commented 7 years ago

Hi I'll mark that as an enhancement after commenting, for the pipe input question. Yes, so far pipes can't take inputs. I can see how it would be useful in the example you show, but so far, the concept of a pipe in the backend does not have inputs. And having in the UI an input connector could be confusing for the majority of use cases where a pipe won't have inputs from other pipes. That's why I'm not sure it is reasonable to add it, but I'll try it out.

For the second issue, for now I just want to confirm it. You are using extract correctly, the link should be set as content. This seems like a bug in the backend, the xml library or the generated RSS, I'll debug it.

jean commented 7 years ago

having in the UI an input connector could be confusing for the majority of use cases where a pipe won't have inputs from other pipes

Yes, it would need to be handled in the UI. A pipe that requires an input could show up greyed-out on mypipes to show that it isn't active, and the pipe box on that page could have the input bump that input-requiring blocks have. Conceivably it could even be active: if there is a pipe that takes input, other pipes could show an output bump and allow dragging to connect to the pipe that takes input. Which would then make it active (not greyed-out).

This ends up turning the mypipes view into a higher-level editor view! Not sure if that's workable. Just an idea.

onli commented 7 years ago

For the link problem: For that xpath search, we use nokogiri in HTML mode to parse the feed. The problem is that for HTML, that link element should not be <link>...</link>, but be empty. Moving to XML mode fixes this for valid RSS feeds, but will break the extract module for regular HTML pages which could come from the download block. I'll revisit this after #8 is decided.

onli commented 7 years ago

I made change to address the link extract problem. When giving it an xpath expression, the input is now interpreted as XML. When giving it a css path, it uses HTML mode. As a result, //item/link should now just work. I'd be happy if you could confirm that :)

jean commented 7 years ago

Confirmed, sorry for the long delay.

pipes-digital / pipes

Can pipes take inputs? And am I doing `extract` correctly? #7

A pipe can't take input?

How do I extract links?