rmax / scrapy-redis

Redis-based components for Scrapy.
http://scrapy-redis.readthedocs.io
MIT License
5.54k stars 1.59k forks source link

How to use scrapy-redis if I'm using start_requests() instead of start_urls in my spider? #264

Closed ashwinbala99 closed 1 year ago

ashwinbala99 commented 1 year ago

For instance, my start_requests() takes other parameters in the form of a dictionary (refer to sample code)

def start_requests(self):
        # load inputs from DB
        items = load_urls()

        for an item in items:            
            yield Request(
                url=item['URL],
                callback=self.parse,
                meta={
                    "someid": item["someid"],
                    "someinfo": item["someinfo"],
                },
            )

What should I do to incorporate scrapy-redis in my project?

rmax commented 1 year ago

See this method https://github.com/rmax/scrapy-redis/blob/master/src/scrapy_redis/spiders.py#L137

It already supports loading JSON data from redis as start urls.

I think you just need to push json-serialized data ({"url": ..., "meta": {"someid": ...}}) to the <spidername>:start_urls key in your redis instance.

LuckyPigeon commented 1 year ago

@ashwinbala99 Just like @rmax said, just follow the tutorial, then you can send the requests with json-serialized data. Feel free to ask if you have any further questions :)

ashwinbala99 commented 1 year ago

Hi,

Thank you for responding to my query promptly. I'll certainly reach out if I have any more questions regarding the usage of the library.

Regards Ashwin B

On Sun, Dec 11, 2022 at 5:00 PM Jeremy Chou @.***> wrote:

@ashwinbala99 https://github.com/ashwinbala99 Just like @rmax https://github.com/rmax said, just follow the tutorial https://github.com/rmax/scrapy-redis/wiki/Introduction, then you can send the requests with json-serialized data. Feel free to ask if you have any further questions :)

— Reply to this email directly, view it on GitHub https://github.com/rmax/scrapy-redis/issues/264#issuecomment-1345528391, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3U7I6Q2TZIYJKZRNTJIBHDWMW3MFANCNFSM6AAAAAASPN6KAI . You are receiving this because you were mentioned.Message ID: @.***>