scrapinghub / portia

Visual scraping for Scrapy
BSD 3-Clause "New" or "Revised" License
9.31k stars 1.4k forks source link

AttributeError: 'ScrapydDeploy' object has no attribute 'data' #856

Closed villeristi closed 1 year ago

villeristi commented 6 years ago
  1. Started Portia with Docker: docker run -v ~/portia_projects:/app/data/projects:rw -p 9001:9001 scrapinghub/portia
  2. Configured some test-spider
  3. Tried to actually run the spider (from the UI)

an exception is thrown:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/viewsets.py", line 95, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/lib/python3.5/contextlib.py", line 30, in inner
    return func(*args, **kwds)
  File "/app/portia_server/portia_api/resources/route.py", line 72, in dispatch
    return super(JsonApiRoute, self).dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 494, in dispatch
    response = self.handle_exception(exc)
  File "/app/portia_server/portia_api/resources/route.py", line 75, in handle_exception
    response = super(JsonApiRoute, self).handle_exception(exc)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 454, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 491, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/portia_server/portia_api/resources/spiders.py", line 80, in schedule
    data = Deployer(self.project).schedule(spider_id)
  File "/app/portia_server/portia_api/utils/deploy/scrapyd.py", line 67, in schedule
    schedule_data = self._schedule_data(spider, self.data)
AttributeError: 'ScrapydDeploy' object has no attribute 'data'
villeristi commented 6 years ago

Ok, a bit more investigation shows that PR #852 b0rked pretty much the schedule (missing class-methods, syntax-errors, etc...). Not sure if it's ever worked (based on https://github.com/scrapinghub/portia/issues/671) though.

@ruairif what are the plans on making the local-shedule (running the spiders) actually work? This means integrating scrapyd to the project ofc.

Do you need contributors? This seems like a one man show atm 🥇

ruairif commented 6 years ago

I'm happy to accept pull requests fixing this issue. For integrating with scrapyd I think the best approach may be to use docker compose to connect portia and scrapyd

villeristi commented 6 years ago

Great!

So, you mean integrating scrapyd by running it in another docker-container & link these two (portia | scrapyd)?

I already did a solution for running scrapyd on the "main" container because IMHO it's pretty redundand to have an extra container just for scrapyd as it could be easily added to the portia as well where the rest of the stuff are.

ruairif commented 6 years ago

I mean linking 2 containers together.

Running it within portia is also possible. The main issue is with ensuring that you can access scrapyd correctly within the container and can configure its settings.

An advantage of running it in a separate container would allow custom scrapyd images to be created that contain appropriate python libraries for building Portia spiders that also have some custom python code. Here's the configuration for a scrapyd docker image https://github.com/scrapy/scrapyd/pull/283

villeristi commented 6 years ago

One disadvantage on running scrapyd in different / another container is that one must have the slybot installed & configured on that container also (if I've understood it correctly).

But, I'll give some proposals =>

gagandeep commented 6 years ago

It is code refactoring problem. I also got AttributeError: 'ScrapydDeploy' object has no attribute 'data' My analysis is, This is due to line 67, https://github.com/scrapinghub/portia/blob/2d9ddd57173a93dfb2a21f3e41233af6728e9f4c/portia_server/portia_api/utils/deploy/scrapyd.py#L67

Since, class ScrapydDeploy don't have self.data. This was refactored from SpiderRoute class (https://github.com/scrapinghub/portia/commit/9f5f9444d75ee0558431bdc14e7e165a33fd2c8a#diff-4abd8721bdd271b4cc7d3b109bec658aL81) which had self.data, that needs to be passed when initializing ScrapydDeploy and use that instead of self.data.

I don't know, if anything else is breaking so not raising pull request.

ruairif commented 6 years ago

It looks like you're right. I've updated the code to read from the _schedule_data method instead of the non-existent data attribute