This is a project for asynchronously obtaining data from google trends in an efficient way. Inspired by pytrends, I am developing this project based on a asynchronous framework, asyncio, and a related module, aiohttp.
The logic behind this project is to firstly build a cookies pool, then obtain and store the tokenized queries (wrapped inside the widgets) in another pool, and lastly retreive the data with widgets from the widget pool.
Only data of interest over time is tested and avaiable now.
Settings can be customized by amending the settings.json under the foler settings.
An example input of queries is given under the data folder.
An example of proxies file is given under the proxies folder.
The file userAgents.json is from Said-Ait-Driss.
pip install aioTrends
pip install virtualenv
where python3.11
copy the path to python 3.11 and replace below path
virtualenv -p /path/to/python3.11 atenv
On Windows:
atenv\Scripts\activate
On macOS and Linux:
source atenv/bin/activate
the package must be installed under the environment of python 3.10+
pip install aioTrends
cd path/to/your/working/path
python
import aioTrends as at
import pickle
qrys = {
0: {'keywords': ['AAPL'], 'periods': '2007-01-01 2007-08-31', 'freq': 'D'},
1: {'keywords': ['AMZN'], 'periods': 'all', 'freq': 'M'},
2: {'keywords': ['AAPL', 'AMZN'], 'periods': 'all', 'freq': 'M'},
.
.
.
10000: {'keywords': ['MSFT'], 'periods': '2004-01-01 2022-12-31', 'freq': 'M'}
}
pickle.dump(qrys, open('./data/qrys.pkl', 'wb'))
Alternatively, function formQueries
would form the query dataset based on the list of keywords you give.
from aioTrends import formQueries
from datetime import date
import pickle
qrys = formQueries(keywords=['AMZN', 'MSFN'], start='2004-01-01', end=date.today(), freq='D')
pickle.dump(qrys, open('./data/qrys.pkl', 'wb'))
import aioTrends as at
#Step 0: Set the log file. Other settings can be customized by amending the settings.json under the folder settings.
at.setLog('./data/hello.log')
#Step 1: collect 1000 cookies with 100 cocurrent tasks. Cocurrent tasks amount can be customized.
at.CookeisPool(100).run(1000)
#Step 2: get widgets with 100 cocurrent tasks. Cocurrent tasks can be customized.
at.WidgetsPool(100).run()
#Step 3: get data with 100 cocurrent tasks. Cocurrent tasks can be customized.
at.DataInterestOverTime(100).run()
Alternatively, you can use below one line for forming queries and getting daily scaled data or monthly data.
import aioTrends as at
from datetime import date
qry_list = ['AMZN', 'AAPL', 'MSFT']
# running 50 cocurrent tasks
ataio = at.Aio(50)
df = ataio.getScaledDailyData(
keywords=qry_list, # the query keyword list
filename='test.csv', # json and pickle are both supported
start='2004-01-01', # both datetime and str are supported
end=date.today()
)
fig = df.plot(figsize=(16,8), title='TEST_SCALED_DAILY_DATA').get_figure()
fig.savefig('test_scaled_daily_data.png')
df_m = ataio.getMonthlyData(
keywords=qry_list,
start='2004-01-01',
end='2022-12-31'
)
fig = df_m.plot(figsize=(16,8), title='TEST_MONTHLY_DATA').get_figure()
fig.savefig('test_monthly_data.png')
python example.py