Closed jpmckinney closed 4 years ago
Tagging @yolile @romifz @duncandewhurst @pindec @mrshll1001 for comment.
It could be an option we offer, but don't think removing it from the server should be considered.
I think there's a misunderstanding. The issue description is about sending a remote request to Scrapy from a local machine, instead of logging into the server and sending a local request to Scrapy. The latter just seems like extra work. Can you respond to that proposal?
Oh I see, sorry. I think taking that in 2 parts: sending the request to Scrapy to start from the local machine is fine, but I would be wary of getting people to update scrapers direct from local machines as it may be harder to know for sure which version of the scrapers we have loaded at any one point. Also the point about the consistent environment, less technical users and debugging issues for people is I think still relevant.
I added a step for a user to ensure that they are about to deploy the latest spiders. I retained abbreviated instructions for how to do this from the server. https://ocdsdeploy.readthedocs.io/en/latest/use/kingfisher-collect.html#update-spiders-in-kingfisher-scrape
Only Romi, Yohanna, Andres, and you are expected to deploy scrapers (since no one else writes scrapers), whom I don't consider to be less technical users.
The only part of the environment that needs to be consistent is the scrapyd-client package, which hasn't seen a release in 3 years.
To schedule crawls, we presently require analysts to connect to the server (as the correct user) and run a
curl
command. We also require analysts to follow a multi-step process to update spiders.However, I can already do:
curl http://scrape:PASSWORD@scrape.kingfisher.open-contracting.org/schedule.json -d project=kingfisher -d spider=test_fail
(We can provide instructions on how to create shell aliases, so that analysts don't need to find the password every time.)
Similarly, if I configure
scrapy.cfg
in the same way (in which case we'd probably remove it from version control), then I can runscrapyd-deploy
from my own machine.So:
If not, then we can also remove the local copy of the kingfisher-scrape repository from the server, after closing #295 and #294. I also like that this means there will be no reason for analysts to regularly login as the
ocdskfs
user.