huginn / huginn

Create agents that monitor and act on your behalf. Your agents are standing by!
MIT License
43.12k stars 3.75k forks source link

Compile RSS XML then Full-text RSS Out not working #932

Closed rubella closed 9 years ago

rubella commented 9 years ago

So I'm a bit new to Huginn so please excuse me if this is a 'doh' issue, but I've tried everything i can think of and can't get it to work.

So here's the issue. I'm trying to produce full-text rss feeds from a couple of sites. I used this wiki article Generating a filtered full-text RSS feed from an existing RSS feed minus the bit on creating a filter. First I created a source agent that works well.

{
  "expected_update_period_in_days": "2",
  "url": "http://www.thenation.com/",
  "type": "html",
  "mode": "all",
  "extract": {
    "title": {
      "css": ".story h3 a",
      "value": ".//text()"
    },
    "url": {
      "css": ".story h3 a",
      "value": "@href"
    }
  }
}

Then I compiled the source into an RSS feed, which also works well.

{
  "secrets": [
    "nationrss1"
  ],
  "expected_receive_period_in_days": 2,
  "template": {
    "title": "The Nation RSS Feed",
    "description": "This is a feed of recent Nation articles, generated by Huginn",
    "item": {
      "title": "{{title}}",
      "link": "{{url}}"
    }
  }
}

The problem comes when I try to have a websiteagent fetch the full text from the previously compiled rss feed.

{
  "expected_update_period_in_days": "2",
  "url": "{{url}}",
  "type": "html",
  "mode": "merge",
  "extract": {
    "fullurl": {
      "css": "div",
      "value": "@articlelistlinks"
    },
    "fulltitle": {
      "css": ".article-header-content .title",
      "value": ".//text()"
    },
    "fullbody_text": {
      "css": ".article-body p",
      "value": ".//text()"
    }
  }
}

At this point nothing happens. When I try a dry run I get the following screen which doesn't really help me. screen shot 2015-07-22 at 13 57 34 Does anyone have an idea of something I'm doing wrong or how I can fix this issue. I should also note that I've setup the example from the wiki and have the same problem at the same place. That is, when the agent is supposed to fetch the full text.

irfancharania commented 9 years ago

The reason you're not seeing anything in the dry run is because of the workflow. When you provide {{url}}, the agent will wait for the url from an incoming event. So in this case the dry run doesn't help.

What you can do, temporarily, is replace {{url}} in the WebSiteAgent with the url for an actual article. Then once you've played around and extracted what you need using the dry run, replace the article url with {{url}}, and everything should work as expected.

You can also test it out using the manual agent or just re-emitting an event from the RssAgent

It confused me when I first started using Huginn too... Where do you think we could add this nugget in the wiki?

Also, in your case, you can skip the step with the RssAgent, and it will still work

rubella commented 9 years ago

Thanks for the info and advise re: inserting a url directly instead of trying to debug with {{url}}. Using that method I was finally able to see that my real problem was that I needed more specific CSS selectors. It seems to be working now.