Open duckdalbe opened 12 years ago
The problem is that I'm just taking the last Location: header in the chain, and hoping it's good enough. I guess the proper way to do things is to dig down into LWP::UA and use $ua->simple_request
and manually follow the redirect chain.
I'm not sure if it's necessarily a bug that an intermediate step in that chain doesn't match the whitelist, unless you can think of a reason why it should? For example, with your t.co link, we have:
http://t.co/AgHfYlq -> http://www.faz.net/-025ATJ
http://www.faz.net/-025ATJ -> http://www.faz.net//artikel/C31315/ueberwachung-wir-leben-noch-frei-aber-nicht-mehr-lange-30685243.html
Should the output here be the -025SATJ
url, or the terminal redirection?
Part of resolving this should solve the 2nd part, of making sure URLs get canonicalised as well.
Personally I'd prefer the output of longify-urls to be http://www.faz.net/-025ATJ.
But the real issue is the missing hostnames (2nd part). If you feel you can solve the canonicalization more easily without stepping through the redirect chain I'd be way happier than today, too!
First: Thank you for this script! It really helps me a lot in coping with these stupid "short URLs" on wtitter and the like. Unfortunately I don't know perl well enough to fix the following problem myself, so I'm posting it here.
Currently longify-urls.pl seemingly also "resolves" URLs not listed in longify-urls.list:
http://t.co/AgHfYlq is being resolved to /artikel/C31315/ueberwachung-wir-leben-noch-frei-aber-nicht-mehr-lange-30685243.html whilte the actual Location-Header sent by t.co says http://www.faz.net/-025ATJ while faz.net ist not listed in longify-urls.list: % grep -q faz.net ~/.irssi/longify-urls.list; echo $? 1
(Also this shows that longify-urls.pl doesn't handle Location-headers starting with a slash correctly. It should prepend the known hostname.)
Could you have a look at this?
Thanks!