Closed DharmeshPandav closed 8 years ago
After looking into code-base and few trial-n-error , I see that priority option is already implemented but argument was not documented
it is implemented in SqlitePriorityQueue
class in sqlite.py
package and priority argument value was set to 0
by default in add
method of SqliteSpiderQueue
class in spiderqueue.py
package
def add(self, name, **spider_args):
d = spider_args.copy()
d['name'] = name
priority = float(d.pop('priority', 0))
self.q.put(d, priority)
i think this argument should be included in documentation
should i close this issue and open other for documentation fix ?
Indeed, it is missing documentation and tests. Also, I'd rather move it to the schedule.json webservice and have it next to the other statements that modify the dictionary by popping "reserved" arguments.
For 1.1 branch: https://github.com/Digenis/scrapyd/commit/5d068b36924a9df7ec48935407c89b62c4d7394b I'll commit this one so please comment.
For the next release I plan something like https://github.com/Digenis/scrapyd/commit/448284c20bc1b4326d51a7b7489d13e8886ad952
I'll go through a PR for it because it's backwards incompatible
and I actually want to change it in an even more backwards incompatible way
because passing the spider arguments with **spider_args
is an abuse of the keyword arguments feature
which leads to colliding argument names
(eg try (lambda a, **k: 0)(1, **{'a':2})
)
yes... for current branch -- mention of priority argument in documentation will suffice for next release - PR will work as you have demonstrated
Merged the doc update to the 1.1 branch. Prepared it in #147.
I'll see if there's anything else that should be fixed for 1.1
and make a minor version bump.
Closing in favour of #147
I am having a use case scenario where we are running new job for each task on same domain I am handling this using
scrapyd
andschedule.json
API likebut there are some cases where, new jobs request needs to be served immediately ( soon after current jobs finishes) [i am using single process per domain so have only one process running for any spider at any given time]
can we use priority queuing in scrapyd while scheduling job ? https://github.com/scrapy/queuelib/blob/master/queuelib/pqueue.py
i noticed that this is implemented on Scrapinhub dashboard when we run a spider