jschnurr / scrapyscript

Run a Scrapy spider programmatically from a script or a Celery task - no project required.
MIT License
121 stars 26 forks source link

AttributeError: Can't get attribute 'PythonSpider' on <module '__main__' (built-in)> #13

Open tmancini opened 2 years ago

tmancini commented 2 years ago

Hey all, this is exactly what I was looking for, but running into a few problems trying to test it out on Windows. Using the following I get the error above:

import scrapy
from scrapyscript import Job, Processor

processor = Processor(settings=None)

class PythonSpider(scrapy.spiders.Spider):
    name = "myspider"

    def start_requests(self):
        yield scrapy.Request(self.url)

    def parse(self, response):
        data = response.xpath("//title/text()").extract_first()
        return {'title': data}

job = Job(PythonSpider, url="http://www.python.org")
results = processor.run(job)

print(results)

When I move the Spider into a separate file and import that in, it seems to run without an error, but the results print as an empty array.

import scrapy
from scrapyscript import Job, Processor

from PythonSpider import PythonSpider

settings = scrapy.settings.Settings(values={'LOG_LEVEL': 'WARNING'})
processor = Processor(settings=settings)

job = Job(PythonSpider, url="http://www.python.org")
results = processor.run(job)

print(results)
bsekiewicz commented 2 years ago

It seems that _item_scraped is not triggered, so dispatcher in Processor.__init__() doesn't work. (???)

The temporary solution is moving dispatcher.disconnect(self._item_scraped, signals.item_scraped) from __init__ to crawl in Processor class. Then comment p.terminate() line in run due to some billiard library (win32) issues.

In general, it seems to be something wrong with this library on windows :(