kootenpv / sky

:sunrise: next generation web crawling using machine intelligence
BSD 3-Clause "New" or "Revised" License
329 stars 44 forks source link

asyncio errors #12

Open redevries opened 6 years ago

redevries commented 6 years ago

I'm occasionally seeing asyncio errors. This query completes

UserWarning: Creating a client session outside of coroutine is a very dangerous idea self.session = aiohttp.ClientSession(headers=self.headers) ERROR:asyncio:Creating a client session outside of coroutine

The full log to reproduce is below.

serving skyViewer at "127.0.0.1:7900" from file: /Users/redevries/.virtualenvs/myenv/lib/python3.6/site-packages/sky/view/view.py
{'url': [b'http://www.autoschadeportaal.nl/category/nieuws'], 'crawl_filter_regexps': [b''], 'crawl_required_regexps': [b'2018/'], 'index_filter_regexps': [b''], 'index_required_regexps': [b''], 'bad_xpaths': [b''], 'max_saved_responses': [b'']}
pre crawl_required_regexps []
post crawl_required_regexps ['2018/']
/Users/redevries/sky_view_collections/www.autoschadeportaal.nl
/Users/redevries/.virtualenvs/myenv/lib/python3.6/site-packages/sky/crawler/crawling.py:140: UserWarning: Creating a client session outside of coroutine is a very dangerous idea
  self.session = aiohttp.ClientSession(headers=self.headers)
ERROR:asyncio:Creating a client session outside of coroutine
client_session: <aiohttp.client.ClientSession object at 0x10562b978>
seen urls 52 done urls 32
*** Report ***
http://www.autoschadeportaal.nl/2018/04/16/ahg-geslaagd-voor-kwaliteitsaudits 200 text/html UTF-8 55010 3/3
http://www.autoschadeportaal.nl/2018/04/16/ahg-geslaagd-voor-kwaliteitsaudits/feed 200 application/rss+xml UTF-8 826
http://www.autoschadeportaal.nl/2018/04/16/ahg-geslaagd-voor-kwaliteitsaudits/print/ 200 text/html UTF-8 2828 0/0
http://www.autoschadeportaal.nl/2018/04/16/nederland-in-europese-top-3-minste-verkeersdoden 200 text/html UTF-8 54915 3/3
http://www.autoschadeportaal.nl/2018/04/17/cunningham-lindsey-naar-nieuwe-eigenaar 200 text/html UTF-8 55208 4/4
http://www.autoschadeportaal.nl/2018/04/17/cunningham-lindsey-naar-nieuwe-eigenaar/feed 200 application/rss+xml UTF-8 841
http://www.autoschadeportaal.nl/2018/04/17/cunningham-lindsey-naar-nieuwe-eigenaar/print/ 200 text/html UTF-8 3396 0/0
http://www.autoschadeportaal.nl/2018/04/17/de-haan-automotive-stoot-bergingsactiviteiten-af 200 text/html UTF-8 55627 3/3
http://www.autoschadeportaal.nl/2018/04/17/de-haan-automotive-stoot-bergingsactiviteiten-af/feed 200 application/rss+xml UTF-8 868
http://www.autoschadeportaal.nl/2018/04/17/de-haan-automotive-stoot-bergingsactiviteiten-af/print/ 200 text/html UTF-8 3456 0/0
http://www.autoschadeportaal.nl/2018/04/17/laurens-autoschade-en-autoverhuur-rijnmond-naar-van-mossel 200 text/html UTF-8 56251 4/4
http://www.autoschadeportaal.nl/2018/04/17/laurens-autoschade-en-autoverhuur-rijnmond-naar-van-mossel/feed 200 application/rss+xml UTF-8 898
http://www.autoschadeportaal.nl/2018/04/17/laurens-autoschade-en-autoverhuur-rijnmond-naar-van-mossel/print/ 200 text/html UTF-8 3955 0/0
http://www.autoschadeportaal.nl/2018/04/17/van-hool-bouwt-busfabriek-in-usa 200 text/html UTF-8 55182 3/3
http://www.autoschadeportaal.nl/2018/04/17/van-hool-bouwt-busfabriek-in-usa/feed 200 application/rss+xml UTF-8 820
http://www.autoschadeportaal.nl/2018/04/17/van-hool-bouwt-busfabriek-in-usa/print/ 200 text/html UTF-8 3633 0/0
http://www.autoschadeportaal.nl/2018/04/18/carfixer-start-in-belgie-als-digitaal-platform-schadeherstel 200 text/html UTF-8 55704 3/3
http://www.autoschadeportaal.nl/2018/04/18/carfixer-start-in-belgie-als-digitaal-platform-schadeherstel/feed 200 application/rss+xml UTF-8 905
http://www.autoschadeportaal.nl/2018/04/18/carfixer-start-in-belgie-als-digitaal-platform-schadeherstel/print/ 200 text/html UTF-8 3483 0/0
http://www.autoschadeportaal.nl/2018/04/18/kooijman-autogroep-koopt-bergingsbedrijf-van-mourik 200 text/html UTF-8 55851 3/3
http://www.autoschadeportaal.nl/2018/09/05/autos-met-rijhulpsystemen-grotere-kans-op-schade 200 text/html UTF-8 55616 3/3
http://www.autoschadeportaal.nl/2018/09/05/autos-met-rijhulpsystemen-grotere-kans-op-schade/feed 200 application/rss+xml UTF-8 876
http://www.autoschadeportaal.nl/2018/09/05/autos-met-rijhulpsystemen-grotere-kans-op-schade/print/ 200 text/html UTF-8 3491 0/0
http://www.autoschadeportaal.nl/2018/09/05/cunningham-lindsey-nederland-verder-als-sedgwick 200 text/html UTF-8 55474 2/2
http://www.autoschadeportaal.nl/2018/09/05/cunningham-lindsey-nederland-verder-als-sedgwick/feed 200 application/rss+xml UTF-8 868
http://www.autoschadeportaal.nl/2018/09/05/cunningham-lindsey-nederland-verder-als-sedgwick/print/ 200 text/html UTF-8 3472 0/0
http://www.autoschadeportaal.nl/2018/09/05/schade-analysetool-malin-zelfstandig-verder 200 text/html UTF-8 55558 3/3
http://www.autoschadeportaal.nl/2018/09/05/schade-analysetool-malin-zelfstandig-verder/feed 200 application/rss+xml UTF-8 853
http://www.autoschadeportaal.nl/2018/09/05/schade-analysetool-malin-zelfstandig-verder/print/ 200 text/html UTF-8 3752 0/0
http://www.autoschadeportaal.nl/2018/09/05/yves-kerstens-nieuwe-president-europa-bij-axalta 200 text/html UTF-8 55632 3/3
http://www.autoschadeportaal.nl/2018/09/06/meer-fraude-met-verzekeringen-opgespoord 200 text/html UTF-8 56037 3/3
http://www.autoschadeportaal.nl/category/nieuws 200 text/html UTF-8 63289 11/11
Finished 32 urls in 8.025 secs (max_workers=5) (0.798 urls/sec/task)
        23 html
    816820 html_bytes
         9 other
      7755 other_bytes
Todo: 0
Done: 32
Date: Tue Sep 11 10:27:15 2018 local time
num unique images 10