j0k3r / graby-site-config

Graby site config files
Other
19 stars 30 forks source link

cnn.com.txt and edition.cnn.txt - Meta refresh to unsupported browser page #32

Closed 4oo4 closed 6 years ago

4oo4 commented 6 years ago

I'm using Wallabag 2.3.2 and have found that cnn.com hasn't worked for a while. it appears that the find_string and replace_string to prevent the redirect to the unsupported browser page aren't working:

find_string: <meta http-equiv="refresh"
replace_string: <meta norefresh
[2018-06-05 20:17:30] graby.DEBUG: Use default referer "http://www.google.co.uk/url?sa=t&source=web&cd=1" for url "https://edition.cnn.com/2018/06/05/politics/scott-pruitt-chick-fil-a-job-wife/index.html" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/2018/06/05/politics/scott-pruitt-chick-fil-a-job-wife/index.html"} []
[2018-06-05 20:17:30] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/2.86.0/static/unsupp.html [] []
[2018-06-05 20:17:30] graby.DEBUG: Trying using method "get" on url "https://edition.cnn.com/2.86.0/static/unsupp.html" {"method":"get","url":"https://edition.cnn.com/2.86.0/static/unsupp.html"} []

I tried messing with those strings but couldn't find the fix.

If I'm testing this with wallabag would I need to do anything beyond php /wallabag/bin/console cache:clear --env=prod for it to see the updated site config?

That same link works for me f43.me on though.

Cheers

j0k3r commented 6 years ago

This issue has been fixed few days ago: https://github.com/j0k3r/graby/issues/142 It'll be available in the next wallabag release.

4oo4 commented 6 years ago

@j0k3r Thanks!

j0k3r commented 6 years ago

Also, I forgot to mention that f43.me is up to date with graby. That's why it works on it and not on wallabag.

4oo4 commented 6 years ago

Just curious if there was any harm in updating graby separate from Wallabag? Since sites will inevitably change faster than Wallabag can be updated, it would be really nice to keep the extractor side of things up to date.

j0k3r commented 6 years ago

There an issue about that https://github.com/wallabag/wallabag/issues/1284