Closed ryonlife closed 4 years ago
@ryonlife thanks for raising this issue. Can you try again with version 0.0.27
?
I added a small fix for CrawlSpider
s that I hope it solves this issue.
Please try it out and let me know how it behaves.
I appreciate you taking a crack. However, I updated to the latest version and am still getting the same error.
Could you share more details of your spider or a sample code to reproduce the issue?
I could reproduce it with a simple CrawlSpider
and the fix actually solved it but we might be talking about different use cases.
Copy. I'll begin with a fresh simple spider, see if I can get that working and then try to debug my existing one...
Made a new simple spider that inherits from CrawlSpider
and scrapy-autounit is working just fine.
Back to the original problem, I'm getting the errors on my spiders that are inheriting from a class that I wrote, called ProductSpider
, which inherits from CrawlSpider
. The pickling error from my first message is referring to the line below where a LinkExtractor
object is instantiated and assigned to an instance variable. When I comment that line out, and all references to self.lex
in other methods, although my spiders stop working as intended, scrapy-autounit generates tests and fixtures without throwing an error.
class ProductSpider(CrawlSpider):
def __init__(self, *a, **kw):
super().__init__(*a, **kw)
self.crawl_patterns += self.process_patterns # when processing, still crawl for links
self.lex = LinkExtractor(allow=self.crawl_patterns, unique=True, canonicalize=True)
Taking a quick look at https://github.com/scrapinghub/scrapy-autounit/commit/4a67de320eff6c12d27c5f46c14f42b9980b3a8b, seems there needs to be a dynamic means, e.g. set via a class variable or in settings.py, to exclude additional spider args from pickling.
Got it. I'll review it and get back to you. Thanks for the debugging.
Fixed in v0.0.28.
The new AUTOUNIT_DONT_RECORD_SPIDER_ATTRS
can be used achieve this behavior.
Please don't forget to run autounit update
as soon as you install v0.0.28 to update your current tests and fixtures.
Installed scrapy_autounit for the first time using pip, updated settings per docs, and ran my crawler for the first time. Receiving this error. Using scrapy 2.1.0 and scrapy_autounit 0.0.26. Please advise.
Traceback (most recent call last): File "/Users/ryonlife/peg/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 52, in process_spider_input result = method(response=response, spider=spider) File "/Users/ryonlife/peg/env/lib/python3.7/site-packages/scrapy_autounit/middleware.py", line 86, in process_spider_input 'middlewares': get_middlewares(spider), AttributeError: Can't pickle local object 'LxmlLinkExtractor.__init__.<locals>.<lambda>'