google / corpuscrawler

Crawler for linguistic corpora
Other
190 stars 56 forks source link

Undefined names #87

Open cclauss opened 3 years ago

cclauss commented 3 years ago

% flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics

./corpuscrawler/Lib/corpuscrawler/crawl_mi.py:62:39: F821 undefined name 'sitemap'
        if pubdate is None: pubdate = sitemap[url]
                                      ^
./corpuscrawler/Lib/corpuscrawler/crawl_kab.py:53:48: F821 undefined name 'url'
        assert doc.status == 200, (doc.status, url)
                                               ^
./corpuscrawler/Lib/corpuscrawler/crawl_tpi.py:48:48: F821 undefined name 'url'
        assert doc.status == 200, (doc.status, url)
                                               ^
./corpuscrawler/Lib/corpuscrawler/crawl_shn.py:90:30: F821 undefined name 'striptags'
                p = ' '.join(striptags(replace_html_entities(p)).split())
                             ^
./corpuscrawler/Lib/corpuscrawler/crawl_shn.py:90:40: F821 undefined name 'replace_html_entities'
                p = ' '.join(striptags(replace_html_entities(p)).split())
                                       ^
./corpuscrawler/Lib/corpuscrawler/crawl_ga.py:147:39: F821 undefined name 'fetchresult'
        if pubdate is None: pubdate = fetchresult.headers.get('Last-Modified')
                                      ^
./corpuscrawler/Lib/corpuscrawler/crawl_th.py:25:5: F821 undefined name 'crawl_bibleis'
    crawl_bibleis(crawler, out, bible='THATSV')
    ^
./corpuscrawler/Lib/corpuscrawler/crawl_vec.py:43:48: F821 undefined name 'start_url'
        assert doc.status == 200, (doc.status, start_url)
                                               ^
8     F821 undefined name 'fetchresult'
8

https://flake8.pycqa.org/en/latest/user/error-codes.html

On the flake8 test selection, this PR does not focus on "style violations" (the majority of flake8 error codes that psf/black can autocorrect). Instead, these tests are focus on runtime safety and correctness: