huginn / huginn

Create agents that monitor and act on your behalf. Your agents are standing by!
MIT License
43.13k stars 3.75k forks source link

Including the url in the event sent by WebsiteAgent #1710

Closed mskog closed 7 years ago

mskog commented 8 years ago

Scenario I want to watch for changes for a bunch of sites and then when they change I want to post the url to a Slack channel.

Everything works just fine except that I can't seem to figure out a way to have the event from a WebsiteAgent contain the url it is using. Can this be done?

cantino commented 8 years ago

You could set mode to merge and set url_from_event to {{ url }} in the WebsiteAgent, then make an EventFormattingAgent pass url into the WebsiteAgent. It'll come out the other side because of the merge.

a10kiloham commented 8 years ago

Alternatively, if you can find the URL on the page itself somewhere, you can just extract that as part of the process. For example:

{ "expected_update_period_in_days": "90", "url": "https://www.amazon.co.uk/HP-Original-C9732A-Laserjet-Cartridge/dp/B000077CF5/", "type": "html", "mode": "on_change", "extract": { "body_text": { "xpath": "//*[@id=\"priceblock_ourprice\"]", "value": "substring-after(.,\"£\")" }, "url": { "xpath": "//link[@rel='canonical']/@href", "value": "." } } }

But it would be nice to just pass the URL through if the agent could do that automatically!

a10kiloham commented 8 years ago

@cantino do you think it'd make sense to output the input url (checking that the url is not set otherwise in the extraction)? Seems like it would be a useful feature w/o breaking backward compatibility. It's v useful for my various checks as some pages don't have the URL anywhere in the Xpath and otherwise an EventFormattingAgent seems needlessly complex (I'd have to create one for each WebsiteAgent which would be a pain) Thanks

cantino commented 8 years ago

@bobbysteel, that seems reasonable, but what if a user has an extraction key called url already?

a10kiloham commented 8 years ago

Logically I guess just only fill if there's no existing key by that name? Sort of like a default value? On Fri, Oct 7, 2016 at 2:23 PM Andrew Cantino notifications@github.com wrote:

@bobbysteel https://github.com/bobbysteel, that seems reasonable, but what if a user has an extraction key called url already?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cantino/huginn/issues/1710#issuecomment-252238775, or mute the thread https://github.com/notifications/unsubscribe-auth/AASBwSIm5w9VofoOVMJxT6L2RjsaPNxcks5qxjnLgaJpZM4KGMKv .

cantino commented 8 years ago

You could try adding that if you like and send a PR. It seems reasonable, although maybe it should be an option like include_url or something?

a10kiloham commented 8 years ago

Happy to try although I'm warning you you're gonna get some seriously bad code :) On Fri, Oct 7, 2016 at 5:45 PM Andrew Cantino notifications@github.com wrote:

You could try adding that if you like and send a PR. It seems reasonable, although maybe it should be an option like include_url or something?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cantino/huginn/issues/1710#issuecomment-252287538, or mute the thread https://github.com/notifications/unsubscribe-auth/AASBwabGnKoFcBXcSoSIK53dKbgcopESks5qxmkZgaJpZM4KGMKv .

a10kiloham commented 7 years ago

Strangely, looking at the code this seems to be already in there, just in a kind of buggy way. In line 341 you see result[name] = output[name][index] if name.to_s == 'url' && url.present? result[name] = (url + Utils.normalize_uri(result[name])).to_s end end

So to test, if you include a valid extract key/value as 'url' it automatically returns the url. This should probably be changed to add an include_url option then just use that option here to do that.

a10kiloham commented 7 years ago

I've added a PR https://github.com/cantino/huginn/pull/1748

cantino commented 7 years ago

I believe this has been fixed.