Closed dgw closed 1 year ago
I believe the "did any redirects happen?" part of the fix for this issue involves looking at Response.history
, but I'm less sure about what to do after that after a glance at the plugin to remind myself of its structure.
A patch may want to add a note to plugin.url
to explicitly warn users that they will not receive events for redirects to the URLs they declare this way (since no request is performed)
A patch may want to add a note to
plugin.url
to explicitly warn users that they will not receive events for redirects to the URLs they declare this way (since no request is performed)
That would constitute an inability to re-implement the documented behavior, then, wouldn't it?
"If that redirected URL should be handled by another plugin, dispatch the callback for it."
The URL plugin should probably say to request not to follow redirect, and to implement the follow-redirect logic ourselves. This way, we should be able to look at a request, if it's a redirect, ask the bot if a plugin can deal with it, etc. and so on up until there is no redirect and no plugin can handle it.
Or the bot's URL dispatcher can test for redirects itself if no plugin's handler matches? I wouldn't mind decoupling this redirect-handling behavior from url
at all. It would be ideal, in fact.
Problem is the url
plugin's auto_title
handler also counts as "a plugin's handler" that matches everything. But I'd also say that it's better to set the URL dispatcher to ignore url.auto_title
than to rely on url
to handle redirected links. It's still coupling core behavior to a plugin, but much less obnoxious than the other way around.
If we're considering that, my dream is something like:
Response
to plugins that have registered for page body processing
This would let plugins like bugzilla or a hypothetical mastodon/activitypub plugin to handle the relevant application regardless of the domain/path it lives at (mastodon.social, nondeterministic.computer, etc etc etc etc etc), and for others that use page data instead of turning a URL into an API call, they could dump it directly through an xml parser or BS4.
Worth it? No idea.
We've talked about that idea before, and I like it. Not this late in 8.0, and it's big enough I think it has to wait for 9.0, but I like it. If you're up for taking your bullets there and fleshing them out a bit into a standalone feature issue, go for it. 😃
FWIW, a mastodon plugin is no longer hypothetical (soon to be adopted under our org, I think).
The question now is, how can we resolve this issue in a relatively low-effort way for 8.0, while planning ahead to replace it in 9?
(semantic warning: longposting)
That would constitute an inability to re-implement the documented behavior, then, wouldn't it?
I think I'm keying mostly off of your suggestion that the patch that implements this should live in url
, which I understood as meaning that this plugin should be capable of dealing with redirects, but I think that the answer to your question is "yes" the way I've been thinking about it. I agree with your later reply that the general problem does not make sense for 8.0, it may be enough to get url
alone dealing with redirections, and we might learn more about how we want to solve the big problem in pursuing that.
* Sopel performs GET request. If redirect, goto ^
I'm a -1 on Sopel doing eager requests, I don't think the core should be performing a request on every matched URL before dispatching to plugins. What should the rules for this one-size-fits all request look like? What if someone links to an .iso
etc? There's a lot to worry about, and I don't really think it's the core's business as a general rule. At a minimum, we seem to be talking about paying attention to every URL, which is a pretty big change.
However, reflecting on this suggestion, I began wondering if it makes sense for Sopel to provide a helper for requests that will dispatch to plugins that have a registered pattern matching an eventual redirect. Let's assume https://example.com
redirects to some other specific path, and that two plugins have registered patterns for this domain. Does it make sense to do something like this?
@plugin.url("https://example.com")
def pluginA_example(bot, trigger):
urls = web.search_urls(trigger, exclusion_char=bot.config.url.exclusion_char, clean=True)
for url in urls:
# in the servicing of get(), we could dispatch to decorators that match the redirect URL, where appropriate
response = web.get(url) # type: requests.Response
# ...
# …somewhere else, a cousin decoration occurs…
@plugin.url("https://example.com/with/a/specific/path")
def pluginB_example_specific(bot, trigger):
# ...
Other spellings of this "let the bot do the request" ideas might include a helper for explicitly delegating back to the bot (e.g. bot.url_event(redirect_target)
), or some way for plugin authors to explicitly opt in to letting the core do the request for them (e.g. @plugin.url("https://example.com/", do_request_for_me=True)
). I think these options are along the same lines as what @Exirel suggested.
I think all of that is definitely well past the bar for 8.0, but it's interesting to think about how to get redirects working cooperatively.
I've filed #2432 as longer-term planning for restoration of this feature, so that for 8.0 we can focus on 'just' fixing the incorrect claim in the docstrings. It's a good idea to dispatch to callbacks who want to know about redirects, but I think we need to think about that some more.
Once upon a time, the
url
plugin'stitle_auto
feature would check if a link redirected to a URL that should be handled by another plugin, but that appears to no longer be the case since the conversion to userequests
. Its documentation and that for theprocess_urls
helper (below) still say this is the case, but the behavior is gone.https://github.com/sopel-irc/sopel/blob/d36a19dad548068e34357d63761258cdd5e5efc0/sopel/modules/url.py#L290-L295 https://github.com/sopel-irc/sopel/blob/d36a19dad548068e34357d63761258cdd5e5efc0/sopel/modules/url.py#L322-L329
I would argue that this is a useful feature, and we should bring it back. For example, our first-party GitHub plugin could get free handling of repo/issue/PR/commit links shortened through git.io if this was still the case. Any shortened URL at present, actually, is only ever going to get the generic title-fetching behavior.
While it's certainly possible to create Yet Another Sopel Plugin that resolves short URLs, it would be fairly annoying to implement—and it wouldn't block the built-in title handling, either.
Note, I don't care if this is part of
url
, but it makes more sense to put it there than into the bot's core URL handling. We don't necessarily want the core bot making HTTP requests every time it sees a link. Some of the thoughts in this issue were previously expressed in #sopel@libera on 2021-11-30.