Pickling error in middleware

ryonlife commented 4 years ago

Installed scrapy_autounit for the first time using pip, updated settings per docs, and ran my crawler for the first time. Receiving this error. Using scrapy 2.1.0 and scrapy_autounit 0.0.26. Please advise.

Traceback (most recent call last): File "/Users/ryonlife/peg/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 52, in process_spider_input result = method(response=response, spider=spider) File "/Users/ryonlife/peg/env/lib/python3.7/site-packages/scrapy_autounit/middleware.py", line 86, in process_spider_input 'middlewares': get_middlewares(spider), AttributeError: Can't pickle local object 'LxmlLinkExtractor.__init__.<locals>.<lambda>'

fcanobrash commented 4 years ago

@ryonlife thanks for raising this issue. Can you try again with version 0.0.27? I added a small fix for CrawlSpiders that I hope it solves this issue. Please try it out and let me know how it behaves.

ryonlife commented 4 years ago

I appreciate you taking a crack. However, I updated to the latest version and am still getting the same error.

fcanobrash commented 4 years ago

Could you share more details of your spider or a sample code to reproduce the issue? I could reproduce it with a simple CrawlSpider and the fix actually solved it but we might be talking about different use cases.

ryonlife commented 4 years ago

Copy. I'll begin with a fresh simple spider, see if I can get that working and then try to debug my existing one...

ryonlife commented 4 years ago

Made a new simple spider that inherits from CrawlSpider and scrapy-autounit is working just fine.

Back to the original problem, I'm getting the errors on my spiders that are inheriting from a class that I wrote, called ProductSpider, which inherits from CrawlSpider. The pickling error from my first message is referring to the line below where a LinkExtractor object is instantiated and assigned to an instance variable. When I comment that line out, and all references to self.lex in other methods, although my spiders stop working as intended, scrapy-autounit generates tests and fixtures without throwing an error.

class ProductSpider(CrawlSpider):
    def __init__(self, *a, **kw):
        super().__init__(*a, **kw)
        self.crawl_patterns += self.process_patterns # when processing, still crawl for links
        self.lex = LinkExtractor(allow=self.crawl_patterns, unique=True, canonicalize=True)

Taking a quick look at https://github.com/scrapinghub/scrapy-autounit/commit/4a67de320eff6c12d27c5f46c14f42b9980b3a8b, seems there needs to be a dynamic means, e.g. set via a class variable or in settings.py, to exclude additional spider args from pickling.

fcanobrash commented 4 years ago

Got it. I'll review it and get back to you. Thanks for the debugging.

fcanobrash commented 4 years ago

Fixed in v0.0.28. The new AUTOUNIT_DONT_RECORD_SPIDER_ATTRS can be used achieve this behavior. Please don't forget to run autounit update as soon as you install v0.0.28 to update your current tests and fixtures.

scrapinghub / scrapy-autounit

Pickling error in middleware #74