scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 215 forks source link

how can I know it works when I use it with scrapy? #410

Open vidyli opened 3 years ago

vidyli commented 3 years ago

I did everything as the document running-the-rawl, and start to run

scrapy crawl my-spider

I notice the item being crawled from the console, but I don't know whether Frontera works.

What I did

image

sandwarm/frontera/settings.py


BACKEND = 'frontera.contrib.backends.sqlalchemy.Distributed'

SQLALCHEMYBACKEND_ENGINE="mysql://acme:acme@localhost:3306/acme"
SQLALCHEMYBACKEND_MODELS={
    'MetadataModel': 'frontera.contrib.backends.sqlalchemy.models.MetadataModel',
    'StateModel': 'frontera.contrib.backends.sqlalchemy.models.StateModel',
    'QueueModel': 'frontera.contrib.backends.sqlalchemy.models.QueueModel'
}

SPIDER_MIDDLEWARES.update({
    'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 1000,
})

DOWNLOADER_MIDDLEWARES.update({
    'frontera.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware': 1000,
})

SCHEDULER = 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler'

settings.py

FRONTERA_SETTINGS = 'sandwarm.frontera.settings'

Since I enable mysql backend, I am supposed to see connection error, for I don't start mysql yet.

Thanks for your guys hard working, but please make the document easier for humans. for example, a very basic working example, currently, we need to gather all documents to get the basic idea, even the worse, it still doesn't work at all. I alreay spent a week on a working example.

davidsu-citylitics commented 1 year ago

@vidyli did you get it to work?