scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy
BSD 3-Clause "New" or "Revised" License
1k stars 110 forks source link

AttributeError: 'PipeTransport' object has no attribute '_output' #201

Closed HimrajDas closed 1 year ago

HimrajDas commented 1 year ago

This is my codes: def start_requests(self): url = "https://www.udemy.com/courses/search/?src=ukw&q=machine+learning" yield scrapy.Request(url, meta={"playwright": True})

def parse(self, response): course_titles = response.css(".course-card--course-title--vVEjC a::text").getall() for title in course_titles: self.logger.info("Course Title: %s", title)

I PUT THESE CODES IN SETTINGS.PY DOWNLOAD_HANDLERS = { "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", } TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

I HAVE FOLLOWED THE PROPER GUIDELINE BUT I GOT THIS ERROR:

(venv) PS E:\Python_Projects\Buy It\coursescraper> scrapy crawl wolf
2023-06-05 21:46:38 [scrapy.utils.log] INFO: Scrapy 2.8.0 started (bot: coursescraper) 2023-06-05 21:46:38 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.12, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.1, Twisted 22.10.0, Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)], pyOpenSSL 23.1.1 (OpenSSL 3.1.0 14 Mar 2023), cryptography 40.0.2, Platform Windows-10-10.0.19045-SP0 2023-06-05 21:46:38 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'coursescraper', 'FEED_EXPORT_ENCODING': 'utf-8', 'NEWSPIDER_MODULE': 'coursescraper.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'SPIDER_MODULES': ['coursescraper.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} 2023-06-05 21:46:38 [asyncio] DEBUG: Using selector: SelectSelector 2023-06-05 21:46:38 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2023-06-05 21:46:38 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2023-06-05 21:46:39 [scrapy.extensions.telnet] INFO: Telnet Password: 5fcb73738244788f 2023-06-05 21:46:39 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2023-06-05 21:46:43 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2023-06-05 21:46:43 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2023-06-05 21:46:43 [scrapy.middleware] INFO: Enabled item pipelines: [] 2023-06-05 21:46:43 [scrapy.core.engine] INFO: Spider opened 2023-06-05 21:46:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2023-06-05 21:46:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2023-06-05 21:46:44 [scrapy-playwright] INFO: Starting download handler 2023-06-05 21:46:44 [scrapy-playwright] INFO: Starting download handler 2023-06-05 21:46:44 [asyncio] ERROR: Task exception was never retrieved future: <Task finished name='Task-3' coro=<Connection.run() done, defined at E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_connection.py:264> exception=NotImplementedError()> Traceback (most recent call last): File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_connection.py", line 271, in run await self._transport.connect() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_transport.py", line 127, in connect raise exc File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_transport.py", line 116, in connect self._proc = await asyncio.create_subprocess_exec( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\subprocess.py", line 218, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1667, in subprocess_exec transport = await self._make_subprocess_transport( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 498, in _make_subprocess_transport raise NotImplementedError NotImplementedError 2023-06-05 21:46:44 [asyncio] ERROR: Task exception was never retrieved future: <Task finished name='Task-4' coro=<Connection.run() done, defined at E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_connection.py:264> exception=NotImplementedError()> Traceback (most recent call last): File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_connection.py", line 271, in run await self._transport.connect() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_transport.py", line 127, in connect raise exc File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_transport.py", line 116, in connect self._proc = await asyncio.create_subprocess_exec( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\subprocess.py", line 218, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1667, in subprocess_exec transport = await self._make_subprocess_transport( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 498, in _make_subprocess_transport raise NotImplementedError NotImplementedError 2023-06-05 21:46:44 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ScrapyPlaywrightDownloadHandler._engine_started of <scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler object at 0x000001DE79723CA0>> Traceback (most recent call last): File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\internet\defer.py", line 1065, in adapt extracted = result.result() File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy_playwright\handler.py", line 116, in _launch playwright_instance = await self.playwright_context_manager.start() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright\async_api_context_manager.py", line 52, in start return await self.aenter() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright\async_api_context_manager.py", line 47, in aenter playwright = AsyncPlaywright(next(iter(done)).result()) File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_transport.py", line 116, in connect self._proc = await asyncio.create_subprocess_exec( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\subprocess.py", line 218, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1667, in subprocess_exec transport = await self._make_subprocess_transport( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 498, in _make_subprocess_transport raise NotImplementedError NotImplementedError 2023-06-05 21:46:44 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ScrapyPlaywrightDownloadHandler._engine_started of <scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler object at 0x000001DE79D11D80>> Traceback (most recent call last): File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\internet\defer.py", line 1065, in adapt extracted = result.result() File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy_playwright\handler.py", line 116, in _launch playwright_instance = await self.playwright_context_manager.start() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright\async_api_context_manager.py", line 52, in start return await self.aenter() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright\async_api_context_manager.py", line 47, in aenter playwright = AsyncPlaywright(next(iter(done)).result()) File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_transport.py", line 116, in connect self._proc = await asyncio.create_subprocess_exec( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\subprocess.py", line 218, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1667, in subprocess_exec transport = await self._make_subprocess_transport( File "C:\Users\ACER\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 498, in _make_subprocess_transport raise NotImplementedError NotImplementedError 2023-06-05 21:46:49 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.udemy.com/courses/search/?src=ukw&q=machine+learning> Traceback (most recent call last): File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\internet\defer.py", line 1693, in _inlineCallbacks result = context.run( File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\python\failure.py", line 518, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy\core\downloader\middleware.py", line 52, in process_request return (yield download_func(request=request, spider=spider)) File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\internet\defer.py", line 1065, in adapt extracted = result.result() File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy_playwright\handler.py", line 275, in _download_request page = await self._create_page(request=request, spider=spider) File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy_playwright\handler.py", line 182, in _create_page ctx_wrapper = await self._create_browser_context( File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy_playwright\handler.py", line 152, in _create_browser_context await self._maybe_launch_browser() File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy_playwright\handler.py", line 133, in _maybe_launch_browser logger.info("Launching browser %s", self.browser_type.name) AttributeError: 'ScrapyPlaywrightDownloadHandler' object has no attribute 'browser_type'. Did you mean: 'browser_type_name'? 2023-06-05 21:46:49 [scrapy.core.engine] INFO: Closing spider (finished) 2023-06-05 21:46:49 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/builtins.AttributeError': 1, 'downloader/request_bytes': 255, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'elapsed_time_seconds': 5.35813, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2023, 6, 5, 16, 16, 49, 829501), 'log_count/DEBUG': 3, 'log_count/ERROR': 5, 'log_count/INFO': 12, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2023, 6, 5, 16, 16, 44, 471371)} 2023-06-05 21:46:49 [scrapy.core.engine] INFO: Spider closed (finished) 2023-06-05 21:46:49 [scrapy-playwright] INFO: Closing download handler 2023-06-05 21:46:49 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method DownloadHandlers._close of <scrapy.core.downloader.handlers.DownloadHandlers object at 0x000001DE79721AB0>> Traceback (most recent call last): File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\internet\defer.py", line 1693, in _inlineCallbacks result = context.run( File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\python\failure.py", line 518, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy\core\downloader\handlers__init__.py", line 85, in _close yield dh.close() File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\internet\defer.py", line 1693, in _inlineCallbacks result = context.run( File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\python\failure.py", line 518, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) yield deferred_from_coro(self._close()) File "E:\Python_Projects\Buy It\venv\lib\site-packages\twisted\internet\defer.py", line 1065, in adapt extracted = result.result() File "E:\Python_Projects\Buy It\venv\lib\site-packages\scrapy_playwright\handler.py", line 265, in _close await self.playwright_context_manager.aexit() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright\async_api_context_manager.py", line 58, in aexit await self._connection.stop_async() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_connection.py", line 282, in stop_async self._transport.request_stop() File "E:\Python_Projects\Buy It\venv\lib\site-packages\playwright_impl_transport.py", line 100, in request_stop assert self._output AttributeError: 'PipeTransport' object has no attribute '_output'

elacuesta commented 1 year ago

This exact error was already reported at #90. This package does not run natively on Windows, please see https://github.com/scrapy-plugins/scrapy-playwright#lack-of-native-support-for-windows.

naveedsid commented 1 year ago

i am facing the same problem how can i resolve this? please tell me