Closed jean closed 7 years ago
Hi I'll mark that as an enhancement after commenting, for the pipe input question. Yes, so far pipes can't take inputs. I can see how it would be useful in the example you show, but so far, the concept of a pipe in the backend does not have inputs. And having in the UI an input connector could be confusing for the majority of use cases where a pipe won't have inputs from other pipes. That's why I'm not sure it is reasonable to add it, but I'll try it out.
For the second issue, for now I just want to confirm it. You are using extract correctly, the link should be set as content. This seems like a bug in the backend, the xml library or the generated RSS, I'll debug it.
having in the UI an input connector could be confusing for the majority of use cases where a pipe won't have inputs from other pipes
Yes, it would need to be handled in the UI. A pipe that requires an input could show up greyed-out on mypipes to show that it isn't active, and the pipe box on that page could have the input bump that input-requiring blocks have. Conceivably it could even be active: if there is a pipe that takes input, other pipes could show an output bump and allow dragging to connect to the pipe that takes input. Which would then make it active (not greyed-out).
This ends up turning the mypipes view into a higher-level editor view! Not sure if that's workable. Just an idea.
For the link problem: For that xpath search, we use nokogiri in HTML mode to parse the feed. The problem is that for HTML, that link element should not be <link>...</link>
, but be empty. Moving to XML mode fixes this for valid RSS feeds, but will break the extract module for regular HTML pages which could come from the download block. I'll revisit this after #8 is decided.
I made change to address the link extract problem. When giving it an xpath expression, the input is now interpreted as XML. When giving it a css path, it uses HTML mode. As a result, //item/link
should now just work. I'd be happy if you could confirm that :)
Confirmed, sorry for the long delay.
While working on filtering feeds I noticed a few things:
A pipe can't take input?
I can use a pipe as an input for another pipe, but a pipe can't take input itself. I.e. I'd like to create:
->[filter]->[duplicate]->[extract,extract,extract]->[build]->[out]
and name it "filter merges", and then create:[feed1]->[filter merges]->[out]
,[feed2]->[filter merges]->[out]
,[feed3]->[filter merges]->[out]
,instead of
[feed1]->[filter]->[duplicate]->[extract,extract,extract]->[build]->[out]
,[feed2]->[filter]->[duplicate]->[extract,extract,extract]->[build]->[out]
,[feed3]->[filter]->[duplicate]->[extract,extract,extract]->[build]->[out]
.How do I extract links?
Extracting
//item/title
from the filter output works fine, but extracting//item/link
doesn't extract any links.The output of the
[filter]
looks like:but the output of the
[extract //item/link]
block looks like:The output of the
[extract //item/title]
block is fine:Finally, I had to duplicate the
[extract //item/title]
block, because the[build feed]
block requirescontent
, so I had one[extract //item/title]
going totitle
, and another going tocontent
. That seemed a bit unwieldy. I have[extract //item/link]
going tolink
, but the output feed has only empty<link />
elements.