scrapfly / python-scrapfly

Scrapfly Python SDK for headless browsers and proxy rotation
https://scrapfly.io/docs/sdk/python
Other
28 stars 8 forks source link

AttributeError with Scrapy examples #11

Closed matthewcummings closed 8 months ago

matthewcummings commented 10 months ago

When I run the Scrapy examples like the Bea one, I get the following error:

2023-10-22 05:57:11 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: scrapybot)
2023-10-22 05:57:11 [scrapy.utils.log] INFO: Versions: lxml 4.9.3.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.11.4 (main, Jul 30 2023, 10:41:08) [GCC 11.3.0], pyOpenSSL 23.2.0 (OpenSSL 3.1.3 19 Sep 2023), cryptography 41.0.4, Platform Linux-5.15.0-86-generic-x86_64-with-glibc2.35
Unhandled error in Deferred:
2023-10-22 05:57:11 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/scrapy/crawler.py", line 265, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/scrapy/crawler.py", line 269, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/twisted/internet/defer.py", line 1947, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/twisted/internet/defer.py", line 1857, in _cancellableInlineCallbacks
    _inlineCallbacks(None, gen, status, _copy_context())
--- <exception caught here> ---
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/scrapy/crawler.py", line 155, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/scrapy/crawler.py", line 169, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/scrapfly/scrapy/spider.py", line 126, in from_crawler
    crawler.stats.set_value('scrapfly/api_call_cost', 0)
builtins.AttributeError: 'NoneType' object has no attribute 'set_value'

2023-10-22 05:57:11 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/scrapy/crawler.py", line 155, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/scrapy/crawler.py", line 169, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/home/matthew/repos/payclass/settlement-data-parser/.venv/lib/python3.11/site-packages/scrapfly/scrapy/spider.py", line 126, in from_crawler
    crawler.stats.set_value('scrapfly/api_call_cost', 0)
AttributeError: 'NoneType' object has no attribute 'set_value'

I'm using Python 3.11 with: scrapfly-sdk==0.8.9 Scrapy==2.11.0

jjsaunier commented 10 months ago

I have updated the demo in https://github.com/scrapfly/python-scrapfly/tree/master/examples/scrapy/demo, is this still happening?

I'm curious how you get AttributeError: 'NoneType' object has no attribute 'set_value' Am I missing a point where it's possible to disable this module https://docs.scrapy.org/en/latest/topics/stats.html ?

matthewcummings commented 10 months ago

I did not explicitly disable the stats module, it appears to be missing for some reason. I will try your update and see if that makes a difference, thank you.

farovictor commented 9 months ago

I just got into the same problem. This is a problem with a StatsCollector that was introduced in release 2.11.0.

You can check in the diffs in scrapy/crawler.py file.

They do not initialize the stats property anymore, they set to None by default.

Line 77 [removed] -> Line 85 [added] image

jjsaunier commented 9 months ago

Ok, I made changes on main branch, I tested against scrapy from 2.6 to 2.11 and should works as expected - will tag soon if no negative feedback

matthewcummings commented 9 months ago

@jjsaunier I will try this out in the next day or two, thank you.

DJousto commented 9 months ago

Same error for me with scrapfly sdk 0.8.10 and scrapy 2.11.0, "stats" attribute is missing in crawler object

jjsaunier commented 9 months ago

@DJousto Can you try with main branch, it's not tagged yet

DJousto commented 9 months ago

Sorry @jjsaunier can you detail what you mean by main branch ? main branch of "python-scrapfly" ? there is only one branch : "master"

jjsaunier commented 9 months ago

yes I mean the "master" sorry, with github rename I was confused

pip install git+https://github.com/scrapfly/python-scrapfly.git@master
DJousto commented 9 months ago

OK, following this I fall into another error

2023-11-20 16:04:54 [scrapfly] ERROR: [Failure instance: Traceback: <class 'TypeError'>: ExecutionEngine.crawl() takes 2 positional arguments but 3 were given F:\python310\lib\site-packages\twisted\internet\defer.py:661:callback F:\python310\lib\site-packages\twisted\internet\defer.py:763:_startRunCallbacks F:\python310\lib\site-packages\twisted\internet\defer.py:857:_runCallbacks F:\python310\lib\site-packages\twisted\internet\defer.py:1750:gotResult --- --- F:\python310\lib\site-packages\twisted\internet\defer.py:1656:_inlineCallbacks F:\python310\lib\site-packages\twisted\python\failure.py:514:throwExceptionIntoGenerator F:\python310\lib\site-packages\scrapy\core\downloader\middleware.py:86:process_exception F:\python310\lib\site-packages\twisted\internet\defer.py:857:_runCallbacks F:\python310\lib\site-packages\twisted\internet\task.py:869:cb ]