santhoshse7en / news-fetch

A Python Package which helps to scrape all news details from any news websites
MIT License
172 stars 110 forks source link

Does not fetch arabic news #91

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello, I tried it but it did not fetch Arabic news such as https://www.alarabiya.net/ I got zero articles.

My code:

news_paper = newspaper3k.build('https://www.alarabiya.net/', language='ar', memoize_articles=False) 
for article in news_paper.articles:
    article_url = article.url
    news = newsfetch(article_url)

Any idea?

santhoshse7en commented 3 years ago

Refer Link - https://github.com/codelucas/newspaper/ If you are certain that an entire news source is in one language, go ahead and use the same api :)


>>> import newspaper
>>> sina_paper = newspaper.build('http://www.sina.com.cn/', language='zh')

>>> for category in sina_paper.category_urls():
>>>     print(category)
http://health.sina.com.cn
http://eladies.sina.com.cn
http://english.sina.com
...

>>> article = sina_paper.articles[0]
>>> article.download()
>>> article.parse()

>>> print(article.text)
新浪武汉汽车综合 随着汽车市场的日趋成熟,
传统的“集全家之力抱得爱车归”的全额购车模式已然过时,
另一种轻松的新兴 车模式――金融购车正逐步成为时下消费者购
买爱车最为时尚的消费理念,他们认为,这种新颖的购车
模式既能在短期内
...

>>> print(article.title)
两年双免0手续0利率 科鲁兹掀背金融轻松购_武汉车市_武汉汽
车网_新浪汽车_新浪网