pgh-public-meetings / city-scrapers-pitt

Pittsburgh City Scrapers: sourcing public meetings in Pittsburgh
https://pgh-public-meetings.github.io/events/
MIT License
19 stars 66 forks source link

Scrapy Shell "Attribute Error" on Startup #209

Open maxachis opened 3 years ago

maxachis commented 3 years ago

I created a new folder for my latest webscraper, and created an (as yet unfinished) spider file and spider test file, and also updated to the most recent version of city-scrapers-pitt/master branch and ran "pipenv sync --three --dev" before then running "pipenv shell". I then ran scrapy shell and received an attribute error. I attempted to run scrapy shell on another branch which did not have the most recent commits (Nov 27, 2020) and it ran successfully, leading me to suspect the problem may be related to the most recent commits. Would like to know if others are able to replicate the error on their own machine with the most recent commits.

Full traceback below:

  File "c:\users\maxac\appdata\local\programs\python\python37-32\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\maxac\appdata\local\programs\python\python37-32\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\maxac\.virtualenvs\city-scrapers-pitt-ULk2ZIsW\Scripts\scrapy.exe\__main__.py", line 7, in <module>
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\cmdline.py", line 145, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
    func(*a, **kw)
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\cmdline.py", line 153, in _run_command
    cmd.run(args, opts)
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\commands\shell.py", line 68, in run
    crawler.engine = crawler._create_engine()
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\crawler.py", line 101, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\core\engine.py", line 70, in __init__
    self.scraper = Scraper(crawler)
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\core\scraper.py", line 71, in __init__
    self.itemproc = itemproc_cls.from_crawler(crawler)
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\scrapy\utils\misc.py", line 156, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "c:\users\maxac\.virtualenvs\city-scrapers-pitt-ulk2zisw\lib\site-packages\city_scrapers_core\pipelines\diff.py", line 51, in from_crawler
    crawler.spider._previous_results = pipeline.load_previous_results()
  File "C:\Users\maxac\OneDrive\Desktop\pitt_water\city-scrapers-pitt\pipelines.py", line 43, in load_previous_results
    tz = timezone(self.spider.timezone)
AttributeError: 'NoneType' object has no attribute 'timezone'
maxachis commented 3 years ago

Temporary workaround (thank you @ben-nathanson):

  1. Open scrapy.cfg in topmost level of branch
  2. Change line [settings] default = city_scrapers.settings.dev to [settings] default = city_scrapers.settings.base
ben-nathanson commented 3 years ago

@wsnavely

ben-nathanson commented 3 years ago

Note that this error also happens on the City Bureau City Scrapers master branch when we change our default in scrapy.cfg to city_scrapers.settings.prod and run scrapy shell .... So this is pointing me towards some assumption in the city-scrapers-core library that is being violated. Still exploring what's happening here...

will-snavely commented 3 years ago

Sorry, getting back into things today. I am playing around with the issue. I can reproduce it; this only seems to affect scrapy shell at the moment, which is good. I think I have a potential fix.