howie6879 / ruia

Async Python 3.6+ web scraping micro-framework based on asyncio
https://www.howie6879.com/ruia/
Apache License 2.0
1.75k stars 181 forks source link

There is an error in the win10 platform:raise RuntimeError('Event loop stopped before Future completed.') #122

Closed zeinzbern closed 3 years ago

zeinzbern commented 4 years ago

environment: system win10 professional cpu amd

test code

import aiofiles

from ruia import Spider, Item, TextField, AttrField

class HackerNewsItem(Item):
    target_item = TextField(css_select='tr.athing')
    title = TextField(css_select='a.storylink')
    url = AttrField(css_select='a.storylink', attr='href')

class HackerNewsSpider(Spider):
    start_urls = [f'https://news.ycombinator.com/news?p={index}' for index in range(3)]

    async def parse(self, response):
        async for item in HackerNewsItem.get_items(html=response.html):
            yield item

    async def process_item(self, item: HackerNewsItem):
        """Ruia build-in method"""
        async with aiofiles.open('./hacker_news.txt', 'a',encoding="utf-8") as f:

            await f.write(str(item.title) + '\n')

if __name__ == '__main__':
    HackerNewsSpider.start()

error message:

[2020:11:05 10:28:53] INFO  Ruia Stopping spider: Ruia
Traceback (most recent call last):
  File ".\demo.py", line 26, in <module>
    HackerNewsSpider.start()
  File "F:\git\ruia\ruia\spider.py", line 347, in start
    spider_ins.loop.run_until_complete(
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38-32\lib\asyncio\base_events.py", line 614, in run_until_complete
    raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.

I think it is a signal that win10 does not support Unix

cao-weiwei commented 4 years ago

Got the same issues. I'm doing the test code on my Mac OS Catalina 10.15.6 (19G2021) using PyCharm 2020.1.3 (Professional Edition). Below is the error message:

[2020:11:11 16:12:23] INFO  Ruia Spider started!
[2020:11:11 16:12:23] INFO  Ruia Worker started: 140591938788528
[2020:11:11 16:12:23] INFO  Ruia Worker started: 140591938788704
[2020:11:11 16:12:23] INFO  Request <GET: https://news.ycombinator.com>
[2020:11:11 16:12:24] INFO  Request <GET: https://news.ycombinator.com/news?p=1>
[2020:11:11 16:12:24] INFO  Request <GET: https://news.ycombinator.com/news?p=2>
[2020:11:11 16:12:26] INFO  Ruia Stopping spider: Ruia
Traceback (most recent call last):
  File "/Users/bigo/PycharmProjects/crawler/hacker_news_spider/hacker_news.py", line 37, in <module>
    HackerNewsSpider.start(middleware=middleware)
  File "/Users/bigo/.conda/envs/crawler/lib/python3.7/site-packages/ruia/spider.py", line 348, in start
    spider_ins._start(after_start=after_start, before_stop=before_stop)
  File "/Users/bigo/.conda/envs/crawler/lib/python3.7/asyncio/base_events.py", line 577, in run_until_complete
    raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.

I checked my MongoDB, it seems working correctly, the data is here even though the test code throws errors.

howie6879 commented 3 years ago

@cao-weiwei What python version are you using?

arckalsun commented 3 years ago

https://github.com/howie6879/ruia/blob/127ca222b982cdbadfd567cd189541b437ce756e/ruia/spider.py#L529 将这行代码注释掉。 这个没必要吧。 @howie6879

yxlwfds commented 3 years ago

me too [2020:12:04 16:04:54] INFO Ruia Stopping spider: Ruia Traceback (most recent call last): File "test.py", line 29, in <module> RetryDemo.start() File "/root/.local/share/virtualenvs/web-Ln6OeU4f/lib/python3.8/site-packages/ruia/spider.py", line 347, in start spider_ins.loop.run_until_complete( File "/usr/lib/python3.8/asyncio/base_events.py", line 614, in run_until_complete raise RuntimeError('Event loop stopped before Future completed.') RuntimeError: Event loop stopped before Future completed.

howie6879 commented 3 years ago

pip install git+https://github.com/howie6879/ruia