I have a suggestion to make for solving a problem i have, let me explain it:
Add 2 urls to a queue for one spider
Spider reads 2 requests and yields both of them as a request
Now sometimes the following is happening in my case: Both requests are yield and i get a session key in both requests. The first one finished returns a bunch of new requests (yield a new request is not possible) and the second yielded request is waiting until the first yielded request has crawled an item (could last some minutes). In the meantime the session key of my second request is expired and it get's rejected when trying to make a new request.
My idea would be to have an attribute in the spider which allows me to define that only 1 item from the queue is read and yielded as a response.
class MySpider(RedisSpider):
yield_1_request = True
And next_requests in spiders.py has to be changed to something like:
if req:
yield req
found += 1
if hasattr(self, 'yield_1_request') and self.yield_1_request and not use_set:
break
else:
self.logger.debug("Request not made from data: %r", data)
In this way you still could decide for every spider if this is necessary and also it is not a big impact in the code. What do you think about the idea/implementation? Is there anything i could make better? I would also prepare a pr when this is accepted as a feature.
Hello,
first of all: great project, and easy to use.
I have a suggestion to make for solving a problem i have, let me explain it:
returns
a bunch of new requests (yield a new request is not possible) and the second yielded request is waiting until the first yielded request has crawled an item (could last some minutes). In the meantime the session key of my second request is expired and it get's rejected when trying to make a new request.My idea would be to have an attribute in the spider which allows me to define that only 1 item from the queue is read and yielded as a response.
And
next_requests
inspiders.py
has to be changed to something like:In this way you still could decide for every spider if this is necessary and also it is not a big impact in the code. What do you think about the idea/implementation? Is there anything i could make better? I would also prepare a pr when this is accepted as a feature.
Thanks for your help in advance :)