istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

Fill a single item from many requests / "can't pickle thread.lock objects" #210

Closed llermaly closed 5 years ago

llermaly commented 5 years ago

Hello ,I have a working scrapper in regular scrapy that I'm trying to migrate to scrapy-cluster with some problems.

Think on this object :

object = {'name':'test', 'color':'green', 'books':[{'name': 'book1', 'pages':[{'name':'page1', 'annexed':[{'name':'annexed1'}...]...

Object has books, each books has pages , and each page can have annexeds and I need many requests inside the same object to fill the data.

Error im getting is : "can't pickle thread.lock objects"

Will try to reproduce the error in a simple way and attach it here but basically I loop requests inside a method and after that method continues, maybe that is the problem, will check.

Thanks

madisonb commented 5 years ago

If you try to pass unserializable objects inside of the meta results, or are trying to log data with unserializable results, I think you will get the error above.

Given that this is a custom spider problem, instead of a problem with the core project, I am closing this based on the guidelines on rtd. I try to reserve the issue tracker for problems appearing within the open source project instead of a custom setup.

VincentChen123 commented 5 years ago

it's probably because they passed something like 'request.errback' into the 'extras', my solution is to change 'extras['error_request'] =request' to 'extras['error_request'] = request_to_dict(request)', what do you think ? @madisonb

madisonb commented 5 years ago

Yes, on the surface that seems like a valid solution, that way the request is a dict and can be written into json.