tubndgit / scrapyx-bright-data

3 stars 1 forks source link

How to use #1

Open herol-gl opened 3 years ago

herol-gl commented 3 years ago

Could you provide a sample code on how to implement it in Scrapy spider file?

Thanks

tubndgit commented 3 years ago

Hello You can add this custom setting to spider file

class SampleSpider(scrapy.Spider):
    name = 'sample'

    # Custom settings
    custom_settings = {        
        'DOWNLOADER_MIDDLEWARES': {
            'scrapyx_bright_data.BrightDataProxyMiddleware': 610,  
        },        
        'RETRY_TIMES': 20,
        'BRIGHTDATA_ENABLED': True,
        'BRIGHTDATA_URL': 'http://127.0.0.1:24000'        
    }
herol-gl commented 3 years ago

Thanks for the reply. What plan should I subscribe on BrightData and what credentials should I put to the Scrapy settings or spider file?

tubndgit commented 3 years ago

You can use Data Center plan for scraping, you must install proxy manager on your machine, check it here https://github.com/luminati-io/luminati-proxy

herol-gl commented 3 years ago

I found the way to do it without installing proxy manager. Just pass the proxy url to request using meta.

yield scrapy.Request(
    url,
    headers,
    callback=self.parse,
    meta={'proxy': 'http://zone_username:zone_password@zproxy.lum-superproxy.io:22225'}
)
tubndgit commented 3 years ago

Yes, this is for simple usage

philippkeller commented 1 year ago

@herol-gl where do I find the zone_username?