Closed Eeonum closed 2 years ago
the data are now in the following link but you need to confirm you are not a robot to bypass that. How are you going to do that?
also, the ticker part doesn't work anymore. what is the solution for that?
I am also getting the "BadZipFile" error for d_us_txt.zip. However, it extracted manually just fine using Ubuntus Extract tool.
There is a link that takes you to the english website to download the data. Here it is: https://stooq.com/db/h/. You can manually download it and extract it in the manner intended.
However, the code in the "create_stooq_data.ipynb" notebook under the ### Add symbols heading (found in this github repository under data/create_stooq_data.ipynb) tries to download the file, and add the symbols (tickers and names) but fails because it is not pulling from stooq.
It probably has to be switched to a directory-oriented method - I'm trying to work it out but its going slow. Anyone else know what the fix is to modify the code to traverse the files in the directory and get us so we can store the data in HDF5?
Please see update from December 2020 after cell 3 that automatic download is not longer supported by stooq and that the fix is manual download instead.
@Eeonum @drsxr
It's not perfect but you can download the zip files manually. I re-wrote some of the code and just tested it tonight.
Copy the data from the links and save it in Notepad as a #.text file and it will work
Here's the number key - metadata_dict = { ('jp', 'tse etfs'): 34, ('jp', 'tse stocks'): 32, ('us', 'nasdaq etfs'): 69, ('us', 'nasdaq stocks'): 27, ('us', 'nyse etfs'): 70, ('us', 'nyse stocks'): 28, ('us', 'nysemkt stocks'): 26 }
And the code to process it.
for (market, asset_class), code in metadata_dict.items():
date_file = os.path.join("./stooq", f'{code}.txt')
df = pd.read_csv(date_file, sep=' ').apply(lambda x: x.str.strip())
df.columns = ['ticker', 'name']
df = df.drop_duplicates('ticker').dropna()
print(market, asset_class, f'# tickers: {df.shape[0]:,.0f}')
path = stooq_path / 'tickers' / market
if not path.exists():
path.mkdir(parents=True)
df.to_csv(path / f'{asset_class}.csv', index=False)
File structure should look like this:
Here is the code to process the zip files.
#need to manually download zip files now 'https://static.stooq.com/db/d/'
import os
def download_price_data(market='us'):
data_file = f'd_{market}_txt.zip'
data_file = os.path.join("./stooq", data_file)
print(data_file)
if os.path.exists(data_file):
print('extracting {}'.format(data_file))
with ZipFile(data_file) as zip_file:
for i, file in enumerate(zip_file.namelist()):
if not file.endswith('.txt'):
continue
local_file = stooq_path / file
print('file: {}'.format(local_file))
local_file.parent.mkdir(parents=True, exist_ok=True)
with local_file.open('wb') as output:
for line in zip_file.open(file).readlines():
output.write(line)
else:
print('Could not find file: {}'.format(data_file))
the prn.us.txt in d_us_txt.zip could not open, manually delete it in d_us_txt.zip
@JeremyWhittaker hi there, i wonder what '34.txt' contains, i cannot run the code correctly because i do not have '34.txt', thank you very much
Describe the bug A brief description of the bug and in which notebook/script it lives. stooq data not supported anymore. zipfile.BadZipFile: File is not a zip file To Reproduce
Steps to reproduce the behavior: create_stooq_data.ipynb Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Environment ml4t
Additional context Add any other context about the problem here.