Closed mskog closed 7 years ago
You could set mode
to merge
and set url_from_event
to {{ url }}
in the WebsiteAgent, then make an EventFormattingAgent pass url
into the WebsiteAgent. It'll come out the other side because of the merge
.
Alternatively, if you can find the URL on the page itself somewhere, you can just extract that as part of the process. For example:
{ "expected_update_period_in_days": "90", "url": "https://www.amazon.co.uk/HP-Original-C9732A-Laserjet-Cartridge/dp/B000077CF5/", "type": "html", "mode": "on_change", "extract": { "body_text": { "xpath": "//*[@id=\"priceblock_ourprice\"]", "value": "substring-after(.,\"£\")" }, "url": { "xpath": "//link[@rel='canonical']/@href", "value": "." } } }
But it would be nice to just pass the URL through if the agent could do that automatically!
@cantino do you think it'd make sense to output the input url (checking that the url is not set otherwise in the extraction)? Seems like it would be a useful feature w/o breaking backward compatibility. It's v useful for my various checks as some pages don't have the URL anywhere in the Xpath and otherwise an EventFormattingAgent seems needlessly complex (I'd have to create one for each WebsiteAgent which would be a pain) Thanks
@bobbysteel, that seems reasonable, but what if a user has an extraction key called url
already?
Logically I guess just only fill if there's no existing key by that name? Sort of like a default value? On Fri, Oct 7, 2016 at 2:23 PM Andrew Cantino notifications@github.com wrote:
@bobbysteel https://github.com/bobbysteel, that seems reasonable, but what if a user has an extraction key called url already?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cantino/huginn/issues/1710#issuecomment-252238775, or mute the thread https://github.com/notifications/unsubscribe-auth/AASBwSIm5w9VofoOVMJxT6L2RjsaPNxcks5qxjnLgaJpZM4KGMKv .
You could try adding that if you like and send a PR. It seems reasonable, although maybe it should be an option like include_url
or something?
Happy to try although I'm warning you you're gonna get some seriously bad code :) On Fri, Oct 7, 2016 at 5:45 PM Andrew Cantino notifications@github.com wrote:
You could try adding that if you like and send a PR. It seems reasonable, although maybe it should be an option like include_url or something?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cantino/huginn/issues/1710#issuecomment-252287538, or mute the thread https://github.com/notifications/unsubscribe-auth/AASBwabGnKoFcBXcSoSIK53dKbgcopESks5qxmkZgaJpZM4KGMKv .
Strangely, looking at the code this seems to be already in there, just in a kind of buggy way. In line 341 you see
result[name] = output[name][index] if name.to_s == 'url' && url.present? result[name] = (url + Utils.normalize_uri(result[name])).to_s end end
So to test, if you include a valid extract key/value as 'url' it automatically returns the url. This should probably be changed to add an include_url option then just use that option here to do that.
I've added a PR https://github.com/cantino/huginn/pull/1748
I believe this has been fixed.
Scenario I want to watch for changes for a bunch of sites and then when they change I want to post the url to a Slack channel.
Everything works just fine except that I can't seem to figure out a way to have the event from a WebsiteAgent contain the url it is using. Can this be done?