teeli / urltitle

Eggdrop scripts that display titles of URLs pasted on an IRC channel
MIT License
8 stars 10 forks source link

Broken URLs #10

Open teeli opened 7 years ago

teeli commented 7 years ago

Please report any URLs that aren't working properly (either causing errors on your bot's partyline or just not showing titles correctly) here.

Make sure you include the URL in question, any errors you might see and your configuration (any relevant software versions, e.g. eggdrop version, tcl version, tcl extension versions)

OmkAR2013 commented 7 years ago

eggdrop 1.80 Tcl library: /home/eggie/tcl85/lib/tcl8.5 Tcl version: 8.5.19 (header version 8.5.19) TLS support is enabled. TLS library: OpenSSL 1.0.2g 1 Mar 2016

https://twitter.com/Breaking911/status/842624423358291968

[05:09:32] Tcl error [UrlTitle::handler]: can't read "meta(Content-Type)": no such element in array

teeli commented 7 years ago

@OmkAR2013 that should be fixed in the latest version

OmkAR2013 commented 7 years ago

I got all previously unworking url's working. It's great! Everything except Twitter https links.

https://twitter.com/Reuters https://twitter.com/i/moments/842395226299760641

There's no error being displayed in the bot log, so I'm not sure what's happening. Using newest urltitle.tcl

Any suggestions? What setup do you have teeli for your working bot?

CONFIG ->

I am mOOpeY, running eggdrop v1.8.1+RC2: 1 user (mem: 100k). Configured with: '--with-tcllib=/home/moopey/local/lib/libtcl8.6.so' '--with tclinc=/home/moopey/local/include/tcl.h' '--enable-tls' OS: Linux 4.4.0-66-generic Process ID: 37832 (parent 1) Tcl library: /home/moopey/local/lib/tcl8.6 Tcl version: 8.6.6 (header version 8.6.6) Tcl is threaded. TLS support is enabled. TLS library: OpenSSL 1.0.2g 1 Mar 2016 IPv6 support is enabled.

tDOM - a XML/DOM/XPath/XSLT implementation for Tcl (Version 0.8.4)

tcltls-1.7.11.tar.gz tcllib_1_18.tar.gz tcl8.6.6-src.tar.gz eggdrop-1.8.1rc2.tar.gz

knofte commented 7 years ago

We're getting similar with random urls now.

07:57:19 <@Knofte> https://casinojakten.se 07:57:21 <@Servant> Title: Freespin och Bäst Bonus från de Bästa Casinon!! | casinojakten.se 07:57:27 <@Knofte> https://www.sunet.se 07:57:29 <@Servant> Title: SUNET | Datakommunikation & infrastruktur för forskning och utbildning 07:57:41 <@Knofte> http://www.google.com 07:57:46 <@Knofte> https://www.google.com 07:57:50 <@Knofte> https://google.com 07:57:56 <@Knofte> http://google.com ...

ii libtcl8.6:amd64 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - run-time library files ii tcl 8.6.0+9 amd64 Tool Command Language (default version) - shell ii tcl-dev:amd64 8.6.0+9 amd64 Tool Command Language (default version) - development files ii tcl-tls 1.6.7+dfsg-1 amd64 TLS OpenSSL extension to Tcl ii tcl8.6 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - shell ii tcl8.6-dev:amd64 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - development files ii tcl8.6-tdbc 1.0.3-1 amd64 Tcl Database Connectivity ii tcl8.6-tdbc-sqlite3 1.0.3-1 all Tcl Database Connectivity ii tcllib 1.17-dfsg-1 all Standard Tcl Library

FevLoad commented 7 years ago

is it fixed yet ?

teeli commented 7 years ago

Should be better support for HTTP(S) redirects and case insensitive HTTP headers now. Google, Twitter etc. should work.

voidzero commented 7 years ago

Hi @teeli,

I have a new issue: with this url: http://blog.dilbert.com/post/164297628606/how-to-know-youre-in-a-mass-hysteria-bubble

On the partyline, I see this:

Tcl error [UrlTitle::handler]: invalid command name ""

So I added putlog statements everywhere, and it seems to be this line being the culprit: https://github.com/teeli/urltitle/blob/34582be19d64149e8b8c72e7420a642af355f16e/urltitle.tcl#L183

Any idea?

knofte commented 7 years ago

I get the same on that url. Tcl error [UrlTitle::handler]: invalid command name ""

teeli commented 7 years ago

Apparently XPath fails to parse title on that page. I'm not sure why, I suspect it could be because of invalid html structure (stray doctype).

I should probably add some error checking and maybe a regex fallback (if that helps, need to test)

teeli commented 7 years ago

Updated a new version that should fix that issue

voidzero commented 7 years ago

Fixed indeed. Well done. Your TCL-fu is admirable.

lollko commented 5 years ago

after updating imdb there is a problem with urltile

21:37:09 <~lollko> https://www.imdb.com/title/tt1025100/ 21:37:11 <&rss> Title: TryIMDbProFree

is it possible to fix ?

teeli commented 5 years ago

Looks like there's an inline SVG element on the page that has a <title> tag. Need to look if it's possible to exclude those.

For reference

...
<svg width="175px" height="30px" viewBox="0 0 172 29" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<title>TryIMDbProFree</title>
<g id="tryIMDbProFree" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<rect id="tryIMDbProFreeButton" stroke="#A88734" fill="#F1C241" x="1" y="1" width="170" height="28" rx="3"></rect>
<text id="tryIMDbProFreeText">
<tspan x="33" y="19">Try IMDbPro Free</tspan>
...
teeli commented 5 years ago

I've updated a new version now, that fixes the issue with title tags outside <head> when using regex parsing instead of tdom.

Should fix the issue with the IMDB link above.

lollko commented 5 years ago

I've updated a new version now, that fixes the issue with title tags outside <head> when using regex parsing instead of tdom.

Should fix the issue with the IMDB link above.

working fine :) thx for you work

JesseMach commented 5 years ago

Great work, thanks. Most links work fine but BBC News articles don't work for me. :( (and yet BBC Sport links work fine)

[10:11:59] Connection to https://www.bbc.co.uk/news/uk-england-south-yorkshire-47623303/ failed [10:11:59] Error: Missing host part: /news/uk-england-south-yorkshire-47623303 [10:12:07] Connection to http://www.bbc.co.uk/news/uk-england-south-yorkshire-47623303/ failed [10:12:07] Error: Missing host part: /news/uk-england-south-yorkshire-47623303

knofte commented 5 years ago

Yo, YouTube changed earlier this year (afaik) which created a problem with urltitle, same happened to youtube-dl: https://github.com/Lamieur/youtube-dl/commit/5eabe9c3dc3fe05b26a7f3f833fb55b0287abd4a

For example: Error: HTTP/1.1 429 Too Many Requests (https://www.youtube.com/watch?v=JImcvtJzIK8)

Some say forcing ipv4 for lookup could be used, but was not succesful with curl -I -4 unfortunately.

It'd be great to get YT-titles fixed again :)

teeli commented 5 years ago

I'll take a look and try to figure that out, but it'll probably be a bit more complex fix and might take a bit more time than usual. Looks like it's blocked by the youtube servers on a request level instead being just a parsing error in the script.

knofte commented 5 years ago

Yeah, it seems like the title is loaded firstly after a redirect has been made. Quite annoying feature. :)

knofte commented 5 years ago

There is a youtube-api.tcl available for using the youtube API, perhaps that could give some hints. (could not find a reliable link for it though)

reelated commented 4 years ago

Not sure if its me or not but any page from reuters.com comes back with a blank title.

https://www.reuters.com/article/us-china-aviation-comac-insight/chinas-bid-to-challenge-boeing-and-airbus-falters-idUSKBN1Z905N Title:
knofte commented 4 years ago

Not sure if its me or not but any page from reuters.com comes back with a blank title.

https://www.reuters.com/article/us-china-aviation-comac-insight/chinas-bid-to-challenge-boeing-and-airbus-falters-idUSKBN1Z905N Title:

Same thing here, version 0.11.

Ramshie commented 4 years ago

https://www.bbc.com/news/world-us-canada-51483541 - Nothing happens, no errors in console either.

ramsesatabusimbel commented 4 years ago

Twitter broke some weeks ago, nothing happens on those links. https://twitter.com/ttnyhetsbyran/status/1279837369605160960?s=20 For example. Tcl library: /usr/share/tcltk/tcl8.6 Tcl version: 8.6.9 (header version 8.6.9) Tcl is threaded. TLS support is enabled. TLS library: OpenSSL 1.1.1d 10 Sep 2019

EDIT: Other sources tell me Twitter needs API to work. Perhaps not as easy fix then. Rather use a twitter exclusive script.

lollko commented 4 years ago

hi fellas

i tried some YT links but url title show me

22:23:40 <~lollko> https://www.youtube.com/watch?v=-tDiXMeEWzw
22:23:42 <&rss> Title: YouTube

maybe yt redesign yt site ?

here is my "conf" from egg

22:37:15 <rss> Tcl library: /usr/share/tcl8.5
22:37:15 <rss> Tcl version: 8.5.13 (header version 8.5.13)
22:37:15 <rss> Tcl is threaded.
22:37:15 <rss> TLS support is enabled.
22:37:15 <rss> TLS library: OpenSSL 1.0.2k-fips  26 Jan 2017
hjudges commented 4 years ago

Hi @teeli

Twitter links aren't working for quite a while (no output at all). Can you have a look?

Thanks!

angelperezleon commented 5 months ago

x.com aka twitter link still not working @teeli Is anyone else fixing this?

example: https://x.com/Space_Station/status/1807824547309093239 Bot response: Title: x.com