Closed villeristi closed 1 year ago
Ok, a bit more investigation shows that PR #852 b0rked pretty much the schedule (missing class-methods, syntax-errors, etc...). Not sure if it's ever worked (based on https://github.com/scrapinghub/portia/issues/671) though.
@ruairif what are the plans on making the local-shedule (running the spiders) actually work? This means integrating scrapyd
to the project ofc.
Do you need contributors? This seems like a one man show atm 🥇
I'm happy to accept pull requests fixing this issue. For integrating with scrapyd I think the best approach may be to use docker compose to connect portia and scrapyd
Great!
So, you mean integrating scrapyd
by running it in another docker-container & link these two (portia | scrapyd)?
I already did a solution for running scrapyd on the "main" container because IMHO it's pretty redundand to have an extra container just for scrapyd as it could be easily added to the portia as well where the rest of the stuff are.
I mean linking 2 containers together.
Running it within portia is also possible. The main issue is with ensuring that you can access scrapyd correctly within the container and can configure its settings.
An advantage of running it in a separate container would allow custom scrapyd images to be created that contain appropriate python libraries for building Portia spiders that also have some custom python code. Here's the configuration for a scrapyd docker image https://github.com/scrapy/scrapyd/pull/283
One disadvantage on running scrapyd in different / another container is that one must have the slybot
installed & configured on that container also (if I've understood it correctly).
But, I'll give some proposals =>
It is code refactoring problem. I also got AttributeError: 'ScrapydDeploy' object has no attribute 'data' My analysis is, This is due to line 67, https://github.com/scrapinghub/portia/blob/2d9ddd57173a93dfb2a21f3e41233af6728e9f4c/portia_server/portia_api/utils/deploy/scrapyd.py#L67
Since, class ScrapydDeploy don't have self.data. This was refactored from SpiderRoute class (https://github.com/scrapinghub/portia/commit/9f5f9444d75ee0558431bdc14e7e165a33fd2c8a#diff-4abd8721bdd271b4cc7d3b109bec658aL81) which had self.data, that needs to be passed when initializing ScrapydDeploy and use that instead of self.data.
I don't know, if anything else is breaking so not raising pull request.
It looks like you're right. I've updated the code to read from the _schedule_data
method instead of the non-existent data
attribute
docker run -v ~/portia_projects:/app/data/projects:rw -p 9001:9001 scrapinghub/portia
an exception is thrown: