python-discord / bot

The community bot for the Python Discord community
https://pythondiscord.com
MIT License
1.34k stars 666 forks source link

Unfurl Link Shortners #1933

Open HassanAbouelela opened 2 years ago

HassanAbouelela commented 2 years ago

Description

One feature we'd like to have on the mod team right now is the ability to unfurl a shortened URL to apply further filters to it. There are multiple services we'd like to do this for, such as bit.ly.

We had discussed a similar idea in the past, but ultimately decided not to continue with it due to concerns with us making requests to untrusted destinations, or following a redirect too far causing an error. I believe I have a solution that'll help the mod team, while not introducing too much risk.

Solution

I believe we can safely make a request to the shortner, with redirects disabled, and check the return headers. This addresses one of the two problems above since if redirects are disabled, we don't actually make the request to the unknown destination, just the shortening service. We can analyze the return to get the destination and run our other filters against that.

I'm going to brush over the depth problem mentioned previously, as I think we can get away with just one unfurling for now. I don't believe nesting of shortened URLs is currently a problem we're facing, but I have ideas for that if need it in the future.

Here are a few more concerns, and ways we can handle them:

onerandomusername commented 2 years ago

Exposing our own IP: Not much of a concern at the moment, since my plan is just to run unfurling against a very small set (either a list in constants, or a filter list). Exposing our IP to a service like bit.ly is not a problem.

Is this still a problem since we're using a cloudflare worker (as implemented in https://github.com/python-discord/workers/pull/20) which runs on cloudflare's network, and would expose their ip (which we don't care about, its cloudflare)

HassanAbouelela commented 2 years ago

It's a volatile IP, which doesn't expose anything for us. Their addresses aren't exactly hidden on account of us having them in the first place.