A Happy and lightweight Python Package that Provides an API to search for articles on Google News and returns a usable JSON response! π
If you like β€οΈ GNews or find it useful π, support the project by buying me a coffee β.
π View Demo
Β·
π Report Bug
Β·
π Request Feature
π© GNews is A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response \ π© As well as you can fetch full article (No need to write scrappers for articles fetching anymore)
Google News cover across 141+ countries with 41+ languages. On the bottom left side of the Google News page you
may find a Language & region
section where you can find all of the supported combinations.
This section provides instructions for two different use cases:
To install the package and start using it in your own projects, follow these steps:
pip install gnews
If you want to make modifications locally, follow these steps to set up the development environment.
.env
file by placing your MongoDB credentials.docker-compose up --build
Clone this repository:
git clone https://github.com/ranahaani/GNews.git
Set up a virtual environment:
virtualenv venv
source venv/bin/activate # MacOS/Linux
.\venv\Scripts\activate # Windows
Install the required dependencies:
pip install -r requirements.txt
from gnews import GNews
google_news = GNews()
pakistan_news = google_news.get_news('Pakistan')
print(pakistan_news[0])
[{
'publisher': 'Aljazeera.com',
'description': 'Pakistan accuses India of stoking conflict in Indian Ocean '
'Aljazeera.com',
'published date': 'Tue, 16 Feb 2021 11:50:43 GMT',
'title': 'Pakistan accuses India of stoking conflict in Indian Ocean - '
'Aljazeera.com',
'url': 'https://www.aljazeera.com/news/2021/2/16/pakistan-accuses-india-of-nuclearizing-indian-ocean'
},
...]
GNews.get_top_news()
GNews.get_news(keyword)
GNews.get_news_by_topic(topic)
WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH, POLITICS, CELEBRITIES, TV, MUSIC, MOVIES, THEATER, SOCCER, CYCLING, MOTOR SPORTS, TENNIS, COMBAT SPORTS, BASKETBALL, BASEBALL, FOOTBALL, SPORTS BETTING, WATER SPORTS, HOCKEY, GOLF, CRICKET, RUGBY, ECONOMY, PERSONAL FINANCE, FINANCE, DIGITAL CURRENCIES, MOBILE, ENERGY, GAMING, INTERNET SECURITY, GADGETS, VIRTUAL REALITY, ROBOTICS, NUTRITION, PUBLIC HEALTH, MENTAL HEALTH, MEDICINE, SPACE, WILDLIFE, ENVIRONMENT, NEUROSCIENCE, PHYSICS, GEOLOGY, PALEONTOLOGY, SOCIAL SCIENCES, EDUCATION, JOBS, ONLINE EDUCATION, HIGHER EDUCATION, VEHICLES, ARTS-DESIGN, BEAUTY, FOOD, TRAVEL, SHOPPING, HOME, OUTDOORS, FASHION.
GNews.get_news_by_location(location)
GNews.get_news_by_site(site)
"cnn.com"
All parameters are optional and can be passed during initialization. Hereβs a list of the available parameters:
http
or https
) and the value is the proxy address. Example:
# Example with only HTTP proxy
proxy = {
'http': 'http://your_proxy_address',
}
proxy = { 'https': 'http://your_proxy_address', }
#### Example Initialization
```python
from gnews import GNews
# Initialize GNews with various parameters, including proxy
google_news = GNews(
language='en',
country='US',
period='7d',
start_date=None,
end_date=None,
max_results=10,
exclude_websites=['yahoo.com', 'cnn.com'],
proxy={
'https': 'https://your_proxy_address'
}
)
google_news.period = '7d' # News from last 7 days
google_news.max_results = 10 # number of responses across a keyword
google_news.country = 'United States' # News from a specific country
google_news.language = 'english' # News in a specific language
google_news.exclude_websites = ['yahoo.com', 'cnn.com'] # Exclude news from specific website i.e Yahoo.com and CNN.com
google_news.start_date = (2020, 1, 1) # Search from 1st Jan 2020
google_news.end_date = (2020, 3, 1) # Search until 1st March 2020
The format of the timeframe is a string comprised of a number, followed by a letter representing the time operator. For example 1y would signify 1 year. Full list of operators below:
- h = hours (eg: 12h)
- d = days (eg: 7d)
- m = months (eg: 6m)
- y = years (eg: 1y)
Setting the start and end dates can be done by passing in either a datetime or a tuple in the form (YYYY, MM, DD).
print(google_news.AVAILABLE_COUNTRIES)
{'Australia': 'AU', 'Botswana': 'BW', 'Canada ': 'CA', 'Ethiopia': 'ET', 'Ghana': 'GH', 'India ': 'IN',
'Indonesia': 'ID', 'Ireland': 'IE', 'Israel ': 'IL', 'Kenya': 'KE', 'Latvia': 'LV', 'Malaysia': 'MY', 'Namibia': 'NA',
'New Zealand': 'NZ', 'Nigeria': 'NG', 'Pakistan': 'PK', 'Philippines': 'PH', 'Singapore': 'SG', 'South Africa': 'ZA',
'Tanzania': 'TZ', 'Uganda': 'UG', 'United Kingdom': 'GB', 'United States': 'US', 'Zimbabwe': 'ZW',
'Czech Republic': 'CZ', 'Germany': 'DE', 'Austria': 'AT', 'Switzerland': 'CH', 'Argentina': 'AR', 'Chile': 'CL',
'Colombia': 'CO', 'Cuba': 'CU', 'Mexico': 'MX', 'Peru': 'PE', 'Venezuela': 'VE', 'Belgium ': 'BE', 'France': 'FR',
'Morocco': 'MA', 'Senegal': 'SN', 'Italy': 'IT', 'Lithuania': 'LT', 'Hungary': 'HU', 'Netherlands': 'NL',
'Norway': 'NO', 'Poland': 'PL', 'Brazil': 'BR', 'Portugal': 'PT', 'Romania': 'RO', 'Slovakia': 'SK', 'Slovenia': 'SI',
'Sweden': 'SE', 'Vietnam': 'VN', 'Turkey': 'TR', 'Greece': 'GR', 'Bulgaria': 'BG', 'Russia': 'RU', 'Ukraine ': 'UA',
'Serbia': 'RS', 'United Arab Emirates': 'AE', 'Saudi Arabia': 'SA', 'Lebanon': 'LB', 'Egypt': 'EG',
'Bangladesh': 'BD', 'Thailand': 'TH', 'China': 'CN', 'Taiwan': 'TW', 'Hong Kong': 'HK', 'Japan': 'JP',
'Republic of Korea': 'KR'}
print(google_news.AVAILABLE_LANGUAGES)
{'english': 'en', 'indonesian': 'id', 'czech': 'cs', 'german': 'de', 'spanish': 'es-419', 'french': 'fr',
'italian': 'it', 'latvian': 'lv', 'lithuanian': 'lt', 'hungarian': 'hu', 'dutch': 'nl', 'norwegian': 'no',
'polish': 'pl', 'portuguese brasil': 'pt-419', 'portuguese portugal': 'pt-150', 'romanian': 'ro', 'slovak': 'sk',
'slovenian': 'sl', 'swedish': 'sv', 'vietnamese': 'vi', 'turkish': 'tr', 'greek': 'el', 'bulgarian': 'bg',
'russian': 'ru', 'serbian': 'sr', 'ukrainian': 'uk', 'hebrew': 'he', 'arabic': 'ar', 'marathi': 'mr', 'hindi': 'hi',
'bengali': 'bn', 'tamil': 'ta', 'telugu': 'te', 'malyalam': 'ml', 'thai': 'th', 'chinese simplified': 'zh-Hans',
'chinese traditional': 'zh-Hant', 'japanese': 'ja', 'korean': 'ko'}
title
, published_date
, description
, url
, publisher
.Properties | Description | Example | |
---|---|---|---|
title | Title of the article | IMF Staff and Pakistan Reach Staff-Level Agreement on the Pending Reviews Under the Extended Fund Facility | |
url | Google news link to article | Article Link | |
published date | Published date | Wed, 07 Jun 2017 07:01:30 GMT | |
description | Short description of article | IMF Staff and Pakistan Reach Staff-Level Agreement on the Pending Reviews Under the Extended Fund Facility ... | |
publisher | Publisher of article | The Guardian |
newspaper3k
library to scrape the articlearticle['url']
.pip3 install newspaper3k
.get_full_article
method from GNews
, that creates an newspaper.article.Article
object from the url.from gnews import GNews
google_news = GNews()
json_resp = google_news.get_news('Pakistan')
article = google_news.get_full_article(
json_resp[0]['url']) # newspaper3k instance, you can access newspaper3k all attributes in article
This new object contains title
, text
(full article) or images
attributes. Examples:
article.title
IMF Staff and Pakistan Reach Staff-Level Agreement on the Pending Reviews Under the Extended Fund Facility'
article.text
End-of-Mission press releases include statements of IMF staff teams that convey preliminary findings after a mission. The views expressed are those of the IMF staff and do not necessarily represent the views of the IMFβs Executive Board.\n\nIMF staff and the Pakistani authorities have reached an agreement on a package of measures to complete second to fifth reviews of the authoritiesβ reform program supported by the IMF Extended Fund Facility (EFF) ..... (full article)
article.images
{'https://www.imf.org/~/media/Images/IMF/Live-Page/imf-live-rgb-h.ashx?la=en', 'https://www.imf.org/-/media/Images/IMF/Data/imf-logo-eng-sep2019-update.ashx', 'https://www.imf.org/-/media/Images/IMF/Data/imf-seal-shadow-sep2019-update.ashx', 'https://www.imf.org/-/media/Images/IMF/Social/TW-Thumb/twitter-seal.ashx', 'https://www.imf.org/assets/imf/images/footer/IMF_seal.png'}
article.authors
[]
Read full documentation for newspaper3k
newspaper3k
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)Distributed under the MIT License. See LICENSE
for more information.
Muhammad Abdullah - @ranahaani - ranahaani@gmail.com
Project Link: https://github.com/ranahaani/GNews