'scrapy_splash.SplashMiddleware': 725 —— just noticed different behaviors within or without the config, can someone help to give some advices>
enable the setting, I got nothing been crawled and the info:
2024-10-20 15:45:00 [scrapy.downloadermiddlewares.offsite] DEBUG: Filtered offsite request to 'localhost': <GET https://www.adamchoi.co.uk/overs/detailed via http://localhost:8050/execute> 2024-10-20 15:45:00 [scrapy.core.engine] DEBUG: Signal handler scrapy.downloadermiddlewares.offsite.OffsiteMiddleware.request_scheduled dropped request <GET https://www.adamchoi.co.uk/overs/detailed via http://localhost:8050/execute> before it reached the scheduler.
disable the setting, I got the html source code but none javascript file been rendered
2024-10-20 15:39:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adamchoi.co.uk/overs/detailed> (referer: None) b'<!doctype html>\n<html class="no-js" lang="en">\n\n <head>\n <meta charset="utf-8">\n <title>Football Statistics For Betting</title>\n <meta name="description" content="The best football statistics for popular betting markets | BTTS | Corners | Cards | Booking Points | Over 2.5 Goals | Both Teams To Score | BTTS and Win">\n <meta name="keywords" content="bets prediction betting site football statistics stats btts both teams to score overs corners cards tips booking points team goals">\n <meta name="twitter:card" content="summary_large_image" />\n <meta name="twitter:site" content="https://www.adamchoi.co.uk" />\n <meta name="twitter:title" content="Football Statistics For Betting" />\n <meta name="twitter:description" content="BTTS, Corners, Cards, Booking Points, Overs, Team Goals, BTTS & Win statistics for betting. Many more markets covered across over 50 leagues around the world." />\n <meta name="twitter:image" content="https://www.adamchoi.co.uk/images/og.png?v=1" />\n <meta property="og:title" content="Football Statistics For Betting"/>\n <meta property="og:url" content="https://www.adamchoi.co.uk"/>\n <meta property="og:description" content="BTTS, Corners, Cards, Booking Points, Overs, Team Goals, BTTS & Win statistics for betting. Many more markets covered across over 50 leagues around the world."/>\n <meta property="og:image" content="https://www.adamchoi.co.uk/images/og.png?v=1"/>\n <meta property="og:locale" content="en_GB"/>\n <meta property="og:type" content="website"/>\n <meta name="viewport" content="width=device-width">\n\n <base href=\'/\'>\n <link rel="stylesheet" href="dist/css/vendor-bundle-599428b2b3.css">\n <link rel="stylesheet" href="dist/css/app-bundle-17088dbaef.css?v=1">\n\n <script src="dist/js/vendor-bundle-bebd0fdb69.js"></script>\n <script src="dist/js/app-bundle-798a12ba74.js"></script>\n <!-- endbuild -->\n\n <script>\n (function(i,s,o,g,r,a,m){i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n })(window,document,\'script\',\'//www.google-analytics.com/analytics.js\',\'ga\');\n </script>\n\n <!-- Google Analytics -->\n <script async src="https://www.googletagmanager.com/gtag/js?id=G-8MTGZ91RT2"></script>\n <script>\n window.dataLayer = window.dataLayer || [];\n function gtag(){dataLayer.push(arguments);}\n gtag(\'js\', new Date());\n\n gtag(\'config\', \'G-8MTGZ91RT2\');\n </script>\n\n <!-- Google Tag Manager -->\n <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\n new Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\n j=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n \'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n })(window,document,\'script\',\'dataLayer\',\'GTM-5GQQMBP\');</script>\n <!-- End Google Tag Manager -->\n\n <!-- Google Ad Manager -->\n <script async src="https://securepubads.g.doubleclick.net/tag/js/gpt.js"></script>\n <script>\n window.googletag = window.googletag || {cmd: []};\n\n googletag.cmd.push(function() {\n googletag.pubads().enableLazyLoad();\n googletag.pubads().setCentering(true);\n googletag.pubads().collapseEmptyDivs();\n setInterval(function(){ googletag.pubads().refresh(); }, 30000);\n });\n\n </script>\n\n </head>\n \n <body>\n <!-- Google Tag Manager (noscript) -->\n <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5GQQMBP"\n height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>\n <!-- End Google Tag Manager (noscript) -->\n\n <div data-ng-app="adamChoiStatsApp">\n\n <div data-ui-view="rootView">\n\n </div>\n </div>\n\n <script defer src="https://static.cloudflareinsights.com/beacon.min.js/vcd15cbe7772f49c399c6a5babf22c1241717689176015" integrity="sha512-ZpsOmlRQV6y907TI0dKBHq9Md29nnaEIPlkf84rnaERnq6zvWvPUqr2ft8M1aS28oN72PdrCzSjY4U6VaAw1EQ==" data-cf-beacon=\'{"rayId":"8d5759fa98982ab4","version":"2024.10.1","r":1,"serverTiming":{"name":{"cfExtPri":true,"cfL4":true,"cfSpeedBrain":true,"cfCacheStatus":true}},"token":"4a403f83ab324f8d9ddbdcd08ed7ae8d","b":1}\' crossorigin="anonymous"></script>\n</body>\n\n</html>\n'
my spider file
`import scrapy
from scrapy_splash import SplashRequest
class AdamchoiSpider(scrapy.Spider):
name = "adamchoi"
allowed_domains = ["www.adamchoi.co.uk"]
'scrapy_splash.SplashMiddleware': 725
—— just noticed different behaviors within or without the config, can someone help to give some advices>enable the setting, I got nothing been crawled and the info:
2024-10-20 15:45:00 [scrapy.downloadermiddlewares.offsite] DEBUG: Filtered offsite request to 'localhost': <GET https://www.adamchoi.co.uk/overs/detailed via http://localhost:8050/execute> 2024-10-20 15:45:00 [scrapy.core.engine] DEBUG: Signal handler scrapy.downloadermiddlewares.offsite.OffsiteMiddleware.request_scheduled dropped request <GET https://www.adamchoi.co.uk/overs/detailed via http://localhost:8050/execute> before it reached the scheduler.
disable the setting, I got the html source code but none javascript file been rendered
2024-10-20 15:39:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adamchoi.co.uk/overs/detailed> (referer: None) b'<!doctype html>\n<html class="no-js" lang="en">\n\n <head>\n <meta charset="utf-8">\n <title>Football Statistics For Betting</title>\n <meta name="description" content="The best football statistics for popular betting markets | BTTS | Corners | Cards | Booking Points | Over 2.5 Goals | Both Teams To Score | BTTS and Win">\n <meta name="keywords" content="bets prediction betting site football statistics stats btts both teams to score overs corners cards tips booking points team goals">\n <meta name="twitter:card" content="summary_large_image" />\n <meta name="twitter:site" content="https://www.adamchoi.co.uk" />\n <meta name="twitter:title" content="Football Statistics For Betting" />\n <meta name="twitter:description" content="BTTS, Corners, Cards, Booking Points, Overs, Team Goals, BTTS & Win statistics for betting. Many more markets covered across over 50 leagues around the world." />\n <meta name="twitter:image" content="https://www.adamchoi.co.uk/images/og.png?v=1" />\n <meta property="og:title" content="Football Statistics For Betting"/>\n <meta property="og:url" content="https://www.adamchoi.co.uk"/>\n <meta property="og:description" content="BTTS, Corners, Cards, Booking Points, Overs, Team Goals, BTTS & Win statistics for betting. Many more markets covered across over 50 leagues around the world."/>\n <meta property="og:image" content="https://www.adamchoi.co.uk/images/og.png?v=1"/>\n <meta property="og:locale" content="en_GB"/>\n <meta property="og:type" content="website"/>\n <meta name="viewport" content="width=device-width">\n\n <base href=\'/\'>\n <link rel="stylesheet" href="dist/css/vendor-bundle-599428b2b3.css">\n <link rel="stylesheet" href="dist/css/app-bundle-17088dbaef.css?v=1">\n\n <script src="dist/js/vendor-bundle-bebd0fdb69.js"></script>\n <script src="dist/js/app-bundle-798a12ba74.js"></script>\n <!-- endbuild -->\n\n <script>\n (function(i,s,o,g,r,a,m){i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n })(window,document,\'script\',\'//www.google-analytics.com/analytics.js\',\'ga\');\n </script>\n\n <!-- Google Analytics -->\n <script async src="https://www.googletagmanager.com/gtag/js?id=G-8MTGZ91RT2"></script>\n <script>\n window.dataLayer = window.dataLayer || [];\n function gtag(){dataLayer.push(arguments);}\n gtag(\'js\', new Date());\n\n gtag(\'config\', \'G-8MTGZ91RT2\');\n </script>\n\n <!-- Google Tag Manager -->\n <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\n new Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\n j=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n \'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n })(window,document,\'script\',\'dataLayer\',\'GTM-5GQQMBP\');</script>\n <!-- End Google Tag Manager -->\n\n <!-- Google Ad Manager -->\n <script async src="https://securepubads.g.doubleclick.net/tag/js/gpt.js"></script>\n <script>\n window.googletag = window.googletag || {cmd: []};\n\n googletag.cmd.push(function() {\n googletag.pubads().enableLazyLoad();\n googletag.pubads().setCentering(true);\n googletag.pubads().collapseEmptyDivs();\n setInterval(function(){ googletag.pubads().refresh(); }, 30000);\n });\n\n </script>\n\n </head>\n \n <body>\n <!-- Google Tag Manager (noscript) -->\n <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5GQQMBP"\n height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>\n <!-- End Google Tag Manager (noscript) -->\n\n <div data-ng-app="adamChoiStatsApp">\n\n <div data-ui-view="rootView">\n\n </div>\n </div>\n\n <script defer src="https://static.cloudflareinsights.com/beacon.min.js/vcd15cbe7772f49c399c6a5babf22c1241717689176015" integrity="sha512-ZpsOmlRQV6y907TI0dKBHq9Md29nnaEIPlkf84rnaERnq6zvWvPUqr2ft8M1aS28oN72PdrCzSjY4U6VaAw1EQ==" data-cf-beacon=\'{"rayId":"8d5759fa98982ab4","version":"2024.10.1","r":1,"serverTiming":{"name":{"cfExtPri":true,"cfL4":true,"cfSpeedBrain":true,"cfCacheStatus":true}},"token":"4a403f83ab324f8d9ddbdcd08ed7ae8d","b":1}\' crossorigin="anonymous"></script>\n</body>\n\n</html>\n'
my spider file `import scrapy from scrapy_splash import SplashRequest
class AdamchoiSpider(scrapy.Spider): name = "adamchoi" allowed_domains = ["www.adamchoi.co.uk"]
start_urls = ["https://www.adamchoi.co.uk/overs/detailed"]
`
my setting file
`SPLASH_URL = 'http://localhost:8050'
DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }
SPIDER_MIDDLEWARES = { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36' `