mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2 stars 5 forks source link

More mypy settings, typings #337

Open philbudne opened 1 month ago

philbudne commented 1 month ago

For your consideration:

I'm trying to clean my repo of half-finished project branches.

Turned on additional mypy knobs, the most significant is disallow_any_generics which disallows bare use of List, Dict, Callable, Generator.

Added stubs/rabbitmq_admin.pyi for untyped library.

Eliminated some existing mypy complaints (due to newer versions of mypy on files that haven't changed?)

Added 3 more type: ignore comments (all Scrapy related), new total is 9, which doesn't seem excessive.

and added 13 more lines with Any new total in non-tests code: 42, located in in the following files:

     10 indexer/story.py
      5 indexer/workers/fetcher/batch_spider.py
      5 indexer/elastic.py
      4 indexer/scripts/elastic-conf.py
      3 indexer/story_archive_writer.py
      3 indexer/app.py
      2 indexer/workers/fetcher/sched.py
      1 indexer/workers/hist-fetcher.py
      1 indexer/workers/fetcher/rss_utils.py
      1 indexer/workers/fetcher/BlacklistRedirectMiddleware.py
      1 indexer/workers/archiver.py
      1 indexer/tracker.py
      1 indexer/scripts/rabbitmq-stats.py
      1 indexer/scripts/elastic-stats.py
      1 indexer/pipeline.py
      1 indexer/blobstore/__init__.py
      1 indexer/blobstore/S3.py

My guess is that a lot of the use of Any is due to reading JSON files, or dealing with untyped libraries, and several due to exit method definitions that don't care about the arguments!