nextcloud / news

:newspaper: RSS/Atom feed reader
https://apps.nextcloud.com/apps/news
GNU Affero General Public License v3.0
866 stars 186 forks source link

Favicon downloads for youtube feeds are hit or miss depending on the channel #1486

Closed MartenBE closed 2 years ago

MartenBE commented 3 years ago

When subscribing to youtube links, some download a favicon and others don't. It seems to be related to each channel (e.g. deleting and re-adding don't seem to change the behavior for a channel). See the screenshot for some channels (I've blurred out non-relevant feeds):

image

Nextcloud 22.1.0 PHP 7.4.22 mysql 10.6.4

Log when trying to add the non-favicon youtube channels

{"reqId":"Lt8sZQTR2cLCloM6c0gC","level":2,"time":"2021-08-19T08:11:24+00:00","remoteAddr":"123.123.123.123","user":"martijn","app":"core","method":"POST","url":"/login","message":"Controller OC\\Core\\Controller\\LoginController::tryLogin executed 130 queries.","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0","version":"22.1.0.1"}
{"reqId":"J1G6D8Pho7XkMzlS4G3k","level":3,"time":"2021-08-19T08:16:18+00:00","remoteAddr":"123.123.123.123","user":"martijn","app":"PHP","method":"POST","url":"/apps/news/feeds","message":"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n at /var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php#15","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0","version":"22.1.0.1","exception":{"Exception":"Error","Message":"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n at /var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php#15","Code":0,"Trace":[{"function":"onAll","class":"OC\\Log\\ErrorHandler","type":"::","args":[2,"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n","/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php",15,{"url":"https://cad-comic.com/","http_response_header":["HTTP/1.1 403 Forbidden","Date: Thu, 19 Aug 2021 08:16:18 GMT","Content-Type: text/plain; charset=UTF-8","Content-Length: 16","Connection: close","And 9 more entries, set log level to debug to see all entries"]}]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php","line":15,"function":"file_get_contents","args":["https://cad-comic.com/"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":259,"function":"retrieveUrl","class":"Favicon\\DataAccess","type":"->","args":["https://cad-comic.com/"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":198,"function":"getInPage","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":156,"function":"getFavicon","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com",false]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":338,"function":"get","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":393,"function":"getFavicon","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":[{"__class__":"FeedIo\\Feed"},"https://cad-comic.com/feed/"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":125,"function":"buildFeed","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":[{"__class__":"FeedIo\\Feed"},"https://cad-comic.com/feed/","https://cad-comic.com/feed/"]},{"file":"/var/www/html/custom_apps/news/lib/Service/FeedServiceV2.php","line":211,"function":"fetch","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":["https://cad-comic.com/feed/",false,null,null]},{"file":"/var/www/html/custom_apps/news/lib/Controller/FeedController.php","line":176,"function":"create","class":"OCA\\News\\Service\\FeedServiceV2","type":"->","args":["martijn","https://cad-comic.com/feed/",null,false,null,null,null,true]},{"file":"/var/www/html/lib/private/AppFramework/Http/Dispatcher.php","line":217,"function":"create","class":"OCA\\News\\Controller\\FeedController","type":"->","args":["https://cad-comic.com/?feed=rss",null,null,null,null,true]},{"file":"/var/www/html/lib/private/AppFramework/Http/Dispatcher.php","line":126,"function":"executeController","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"create"]},{"file":"/var/www/html/lib/private/AppFramework/App.php","line":156,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"create"]},{"file":"/var/www/html/lib/private/Route/Router.php","line":301,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OCA\\News\\Controller\\FeedController","create",{"__class__":"OC\\AppFramework\\DependencyInjection\\DIContainer"},{"_route":"news.feed.create"}]},{"file":"/var/www/html/lib/base.php","line":1000,"function":"match","class":"OC\\Route\\Router","type":"->","args":["/apps/news/feeds"]},{"file":"/var/www/html/index.php","line":36,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"/var/www/html/lib/private/Log/ErrorHandler.php","Line":99,"CustomMessage":"--"}}
{"reqId":"J1G6D8Pho7XkMzlS4G3k","level":3,"time":"2021-08-19T08:16:19+00:00","remoteAddr":"123.123.123.123","user":"martijn","app":"PHP","method":"POST","url":"/apps/news/feeds","message":"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n at /var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php#15","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0","version":"22.1.0.1","exception":{"Exception":"Error","Message":"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n at /var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php#15","Code":0,"Trace":[{"function":"onAll","class":"OC\\Log\\ErrorHandler","type":"::","args":[2,"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n","/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php",15,{"url":"https://cad-comic.com/","http_response_header":["HTTP/1.1 403 Forbidden","Date: Thu, 19 Aug 2021 08:16:18 GMT","Content-Type: text/plain; charset=UTF-8","Content-Length: 16","Connection: close","And 9 more entries, set log level to debug to see all entries"]}]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php","line":15,"function":"file_get_contents","args":["https://cad-comic.com/"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":259,"function":"retrieveUrl","class":"Favicon\\DataAccess","type":"->","args":["https://cad-comic.com/"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":198,"function":"getInPage","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":161,"function":"getFavicon","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":338,"function":"get","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":393,"function":"getFavicon","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":[{"__class__":"FeedIo\\Feed"},"https://cad-comic.com/feed/"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":125,"function":"buildFeed","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":[{"__class__":"FeedIo\\Feed"},"https://cad-comic.com/feed/","https://cad-comic.com/feed/"]},{"file":"/var/www/html/custom_apps/news/lib/Service/FeedServiceV2.php","line":211,"function":"fetch","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":["https://cad-comic.com/feed/",false,null,null]},{"file":"/var/www/html/custom_apps/news/lib/Controller/FeedController.php","line":176,"function":"create","class":"OCA\\News\\Service\\FeedServiceV2","type":"->","args":["martijn","https://cad-comic.com/feed/",null,false,null,null,null,true]},{"file":"/var/www/html/lib/private/AppFramework/Http/Dispatcher.php","line":217,"function":"create","class":"OCA\\News\\Controller\\FeedController","type":"->","args":["https://cad-comic.com/?feed=rss",null,null,null,null,true]},{"file":"/var/www/html/lib/private/AppFramework/Http/Dispatcher.php","line":126,"function":"executeController","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"create"]},{"file":"/var/www/html/lib/private/AppFramework/App.php","line":156,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"create"]},{"file":"/var/www/html/lib/private/Route/Router.php","line":301,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OCA\\News\\Controller\\FeedController","create",{"__class__":"OC\\AppFramework\\DependencyInjection\\DIContainer"},{"_route":"news.feed.create"}]},{"file":"/var/www/html/lib/base.php","line":1000,"function":"match","class":"OC\\Route\\Router","type":"->","args":["/apps/news/feeds"]},{"file":"/var/www/html/index.php","line":36,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"/var/www/html/lib/private/Log/ErrorHandler.php","Line":99,"CustomMessage":"--"}}
{"reqId":"oSDc5k2esduEJNNxmCFt","level":3,"time":"2021-08-19T08:20:19+00:00","remoteAddr":"","user":"--","app":"PHP","method":"","url":"--","message":"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n at /var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php#15","userAgent":"--","version":"22.1.0.1","exception":{"Exception":"Error","Message":"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n at /var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php#15","Code":0,"Trace":[{"function":"onAll","class":"OC\\Log\\ErrorHandler","type":"::","args":[2,"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n","/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php",15,{"url":"https://cad-comic.com/","http_response_header":["HTTP/1.1 403 Forbidden","Date: Thu, 19 Aug 2021 08:20:19 GMT","Content-Type: text/plain; charset=UTF-8","Content-Length: 16","Connection: close","And 9 more entries, set log level to debug to see all entries"]}]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php","line":15,"function":"file_get_contents","args":["https://cad-comic.com/"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":259,"function":"retrieveUrl","class":"Favicon\\DataAccess","type":"->","args":["https://cad-comic.com/"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":198,"function":"getInPage","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":156,"function":"getFavicon","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com",false]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":338,"function":"get","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":393,"function":"getFavicon","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":[{"__class__":"FeedIo\\Feed"},"https://cad-comic.com/feed/"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":125,"function":"buildFeed","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":[{"__class__":"FeedIo\\Feed"},"https://cad-comic.com/feed/","https://cad-comic.com/feed/"]},{"file":"/var/www/html/custom_apps/news/lib/Service/FeedServiceV2.php","line":265,"function":"fetch","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":["https://cad-comic.com/feed/",false,null,null]},{"file":"/var/www/html/custom_apps/news/lib/Service/FeedServiceV2.php","line":342,"function":"fetch","class":"OCA\\News\\Service\\FeedServiceV2","type":"->","args":[{"items":[],"id":35,"__class__":"OCA\\News\\Db\\Feed"}]},{"file":"/var/www/html/custom_apps/news/lib/Service/UpdaterService.php","line":55,"function":"fetchAll","class":"OCA\\News\\Service\\FeedServiceV2","type":"->","args":[]},{"file":"/var/www/html/custom_apps/news/lib/Cron/UpdaterJob.php","line":71,"function":"update","class":"OCA\\News\\Service\\UpdaterService","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/html/lib/private/BackgroundJob/Job.php","line":51,"function":"run","class":"OCA\\News\\Cron\\UpdaterJob","type":"->","args":[null]},{"file":"/var/www/html/lib/private/BackgroundJob/TimedJob.php","line":58,"function":"execute","class":"OC\\BackgroundJob\\Job","type":"->","args":[{"__class__":"OC\\BackgroundJob\\JobList"},{"__class__":"OC\\Log"}]},{"file":"/var/www/html/cron.php","line":127,"function":"execute","class":"OC\\BackgroundJob\\TimedJob","type":"->","args":[{"__class__":"OC\\BackgroundJob\\JobList"},{"__class__":"OC\\Log"}]}],"File":"/var/www/html/lib/private/Log/ErrorHandler.php","Line":99,"CustomMessage":"--"}}
{"reqId":"oSDc5k2esduEJNNxmCFt","level":3,"time":"2021-08-19T08:20:19+00:00","remoteAddr":"","user":"--","app":"PHP","method":"","url":"--","message":"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n at /var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php#15","userAgent":"--","version":"22.1.0.1","exception":{"Exception":"Error","Message":"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n at /var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php#15","Code":0,"Trace":[{"function":"onAll","class":"OC\\Log\\ErrorHandler","type":"::","args":[2,"file_get_contents(https://cad-comic.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden\r\n","/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php",15,{"url":"https://cad-comic.com/","http_response_header":["HTTP/1.1 403 Forbidden","Date: Thu, 19 Aug 2021 08:20:19 GMT","Content-Type: text/plain; charset=UTF-8","Content-Length: 16","Connection: close","And 9 more entries, set log level to debug to see all entries"]}]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/DataAccess.php","line":15,"function":"file_get_contents","args":["https://cad-comic.com/"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":259,"function":"retrieveUrl","class":"Favicon\\DataAccess","type":"->","args":["https://cad-comic.com/"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":198,"function":"getInPage","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/vendor/arthurhoaro/favicon/src/Favicon/Favicon.php","line":161,"function":"getFavicon","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":338,"function":"get","class":"Favicon\\Favicon","type":"->","args":["https://cad-comic.com"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":393,"function":"getFavicon","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":[{"__class__":"FeedIo\\Feed"},"https://cad-comic.com/feed/"]},{"file":"/var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php","line":125,"function":"buildFeed","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":[{"__class__":"FeedIo\\Feed"},"https://cad-comic.com/feed/","https://cad-comic.com/feed/"]},{"file":"/var/www/html/custom_apps/news/lib/Service/FeedServiceV2.php","line":265,"function":"fetch","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":["https://cad-comic.com/feed/",false,null,null]},{"file":"/var/www/html/custom_apps/news/lib/Service/FeedServiceV2.php","line":342,"function":"fetch","class":"OCA\\News\\Service\\FeedServiceV2","type":"->","args":[{"items":[],"id":35,"__class__":"OCA\\News\\Db\\Feed"}]},{"file":"/var/www/html/custom_apps/news/lib/Service/UpdaterService.php","line":55,"function":"fetchAll","class":"OCA\\News\\Service\\FeedServiceV2","type":"->","args":[]},{"file":"/var/www/html/custom_apps/news/lib/Cron/UpdaterJob.php","line":71,"function":"update","class":"OCA\\News\\Service\\UpdaterService","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/html/lib/private/BackgroundJob/Job.php","line":51,"function":"run","class":"OCA\\News\\Cron\\UpdaterJob","type":"->","args":[null]},{"file":"/var/www/html/lib/private/BackgroundJob/TimedJob.php","line":58,"function":"execute","class":"OC\\BackgroundJob\\Job","type":"->","args":[{"__class__":"OC\\BackgroundJob\\JobList"},{"__class__":"OC\\Log"}]},{"file":"/var/www/html/cron.php","line":127,"function":"execute","class":"OC\\BackgroundJob\\TimedJob","type":"->","args":[{"__class__":"OC\\BackgroundJob\\JobList"},{"__class__":"OC\\Log"}]}],"File":"/var/www/html/lib/private/Log/ErrorHandler.php","Line":99,"CustomMessage":"--"}}
{"reqId":"p9SCUscjC399efK4e3Ac","level":2,"time":"2021-08-19T08:36:54+00:00","remoteAddr":"123.123.123.123","user":"martijn","app":"settings","method":"GET","url":"/settings/ajax/checksetup","message":"Controller OCA\\Settings\\Controller\\CheckSetupController::check executed 3993 queries.","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0","version":"22.1.0.1"}
{"reqId":"n8vhOURRBTn4ICGsBEOC","level":2,"time":"2021-08-19T08:38:17+00:00","remoteAddr":"123.123.123.123","user":"martijn","app":"settings","method":"GET","url":"/settings/ajax/checksetup","message":"Controller OCA\\Settings\\Controller\\CheckSetupController::check executed 3993 queries.","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0","version":"22.1.0.1"}
Grotax commented 3 years ago

Hey YouTube is simply blocking the request sometimes.

To create a rough picture.

You add one feed, news tries to fetch the favicon. If that request fails because it's blocked it will be empty.

If you have X feeds from YouTube the fetcher will loop over all feeds and try to get the favicon. We know for sure that grabbing the favicon can take multiple requests as there is no protocol just conventions for example created by Apple for the iPhone back then.

We also had an issue report that news is making way more requests than expected, it's not yet reproduced though and we also have to understanding why it might happen.

Anyways, YouTube probably has a automatic load balancer and firewall and will block automated scripts that send too many requests whatever "too many" means for youtube.

If while fetching the update the the favicon request is blocked it will be empty.

I think there is no easy fix for this, the library did receive an update recently regarding the user agent which might lead to less blocking by YouTube (?) Enhancing the logic would be another option but would also slow down the fetching process per feed. Currently creating a new release is blocked by new front-end code that was already merged but doesn't work.

MartenBE commented 3 years ago

Hi, thanks for the answer. Is there a logic in why some are blocked and some aren't?

Grotax commented 3 years ago

Probably it relates to the amount of requests, basically from the point of view from YouTube you ask for the same file over and over again.

And also in the same pattern, so I imagine they have some defense system running in front of the Webservers that is checking for spacious patterns. It might also be different depending on the specific server that is serving that instance of YouTube.

The library has some caching and a trimming logic so it would always ask for YouTube.com/favicon.ico I guess and maybe some more for better resolution (apple standard).

But I guess the cache is only used after the first request, Webservers and clients can communicate with each other to indicate that a file has not changed. But this first request already fails and the library throws an error before touching the cache

Haven't checked the code but that is what I reconstruct from memory

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.