sopel-irc / sopel

:robot::speech_balloon: An easy-to-use and highly extensible IRC Bot framework. Formerly Willie.
https://sopel.chat
Other
948 stars 402 forks source link

url.py: add config option for getting Open Graph metadata #2380

Open ghost opened 1 year ago

ghost commented 1 year ago

Requested Feature

It would be nice to add a config option to read Open Graph metadata from links, falling back to page title when not available.

Problems Solved

Sometimes Open Graph data is better than the page title. Sometimes page titles are generic for the website, but the Open Graph data can provide a lot more. Biggest example I can think of off the top of my head is Instagram links in particular, but there's lots of sites like that.

Alternatives

Don't add this because you hate my guts.

Notes

Some considerations:

RustyBower commented 1 year ago

IMO this belongs in a plugin by itself.

I don't like the idea of overloading the built in url.py plugin with additional logic/complexity/dependencies.

It's definitely "best effort" as it is, and certainly I see issues in my own channels with "Access Denied", empty titles, and other unwanted behavior. However, I don't think it makes sense to try to code for every single edge case here.

Personally, if someone wants to implement this sort of behavior, it should be from an external plugin, and then unload the url.py from loading for their intended behavior.

dgw commented 1 year ago

Unless I'm forgetting some 8.x changes, the problem with saying "do that in another plugin" right now is that there's no way for another plugin to say "I can handle all links!", then say "… but not this one" when it doesn't find Open Graph tags and let url's auto-title handle it after all.

If any plugin claims to handle @url(r'.*') so it can look for OG tags, the auto-title will never run because all URLs will match a callback, and auto-title only runs for links that don't match another callback.

Seems like we had a talk about this some time ago, and someone proposed a whole priority system for callables and a way for them to say "nope, I bail", passing control to the next one. But I couldn't find an issue for it.

Exirel commented 1 year ago

That's the beauty of the new system: url.py is not special anymore, it acts like any other plugin would to work with URLs.

First thing is that it uses @rule and not @url: https://github.com/sopel-irc/sopel/blob/master/sopel/modules/url.py#L315 Second thing is that it checks for existing URL callbacks, and if none can be found, then it will try to do the auto-title thing.

By the way, it could be improved, because trigger has now a urls attribute with all the URLs in it: https://github.com/sopel-irc/sopel/blob/master/sopel/trigger.py#L265-L267

dgw commented 1 year ago

First thing is that it uses @rule and not @url: master/sopel/modules/url.py#L315 Second thing is that it checks for existing URL callbacks, and if none can be found, then it will try to do the auto-title thing.

This doesn't help "do that in another plugin", though, because both url.py and (say) opengraph.py would both have to use @rule, and therefore both would be called, both would check for existing URL callbacks, both would find none, and both would process the link.

If opengraph.py uses @url instead, that makes a URL callback that will always block the @rule-based auto-title, even if there is no Open Graph data at that link.

ghost commented 1 year ago

I suppose I should've mentioned that we already discussed in IRC that this is likely a change that would need to wait for url.py being ripped out of core...my thought was that a theoretical sopel-url or something would be configurable for optionally trying to fetch Open Graph data and falling back to page titles (and page titles only by default) or something. Dunno. ¯\_(ツ)_/¯