Open shriphani opened 8 years ago
For what it's worth, I found 2 problems contributing to making restarts fail.
Factual/durable-queue
doesn't like to restore queues with names containing periods. So, given that queue names are constructed from keywords of the host portion of urls, this was causing a problem. (The restored queue name for http://foo.org/Some/path
was "org"
.) I resolved the problem by replacing the default enqueue pipeline component with one that substitutes _
for .
in queue names.pegasus.queue/setup-queue-worker
isn't called. I rewrote pegasus.core/start-crawl
to take!
the first entry from the to-visit queue with 0 timeout. If it gets something, it constructs a queue worker for the queue name that will be associated with that url. If it gets nothing, it does the normal seeding.Solid. Would really appreciate a PR!
Will do. I'm working through a fork of a fork that a colleague made. I'll put some effort into folding the changes into defaults and core.
See the confusion in: #22