shabble / irssi-scripts

Repo to store some personal irssi scripts
151 stars 24 forks source link

Don't "resolve" URIs not listed in longify-urls.list #12

Open duckdalbe opened 12 years ago

duckdalbe commented 12 years ago

First: Thank you for this script! It really helps me a lot in coping with these stupid "short URLs" on wtitter and the like. Unfortunately I don't know perl well enough to fix the following problem myself, so I'm posting it here.

Currently longify-urls.pl seemingly also "resolves" URLs not listed in longify-urls.list:

http://t.co/AgHfYlq is being resolved to /artikel/C31315/ueberwachung-wir-leben-noch-frei-aber-nicht-mehr-lange-30685243.html whilte the actual Location-Header sent by t.co says http://www.faz.net/-025ATJ while faz.net ist not listed in longify-urls.list: % grep -q faz.net ~/.irssi/longify-urls.list; echo $? 1

(Also this shows that longify-urls.pl doesn't handle Location-headers starting with a slash correctly. It should prepend the known hostname.)

Could you have a look at this?

Thanks!

shabble commented 12 years ago

The problem is that I'm just taking the last Location: header in the chain, and hoping it's good enough. I guess the proper way to do things is to dig down into LWP::UA and use $ua->simple_request and manually follow the redirect chain.

I'm not sure if it's necessarily a bug that an intermediate step in that chain doesn't match the whitelist, unless you can think of a reason why it should? For example, with your t.co link, we have:

  1. http://t.co/AgHfYlq -> http://www.faz.net/-025ATJ
  2. http://www.faz.net/-025ATJ -> http://www.faz.net//artikel/C31315/ueberwachung-wir-leben-noch-frei-aber-nicht-mehr-lange-30685243.html

Should the output here be the -025SATJ url, or the terminal redirection?

Part of resolving this should solve the 2nd part, of making sure URLs get canonicalised as well.

duckdalbe commented 12 years ago

Personally I'd prefer the output of longify-urls to be http://www.faz.net/-025ATJ.

But the real issue is the missing hostnames (2nd part). If you feel you can solve the canonicalization more easily without stepping through the redirect chain I'd be way happier than today, too!