stefan-jansen / machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.
https://ml4trading.io
13.31k stars 4.21k forks source link

stooq data not supported anymore #214

Closed Eeonum closed 2 years ago

Eeonum commented 2 years ago

Describe the bug A brief description of the bug and in which notebook/script it lives. stooq data not supported anymore. zipfile.BadZipFile: File is not a zip file To Reproduce

Steps to reproduce the behavior: create_stooq_data.ipynb Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment ml4t

Additional context Add any other context about the problem here.

Eeonum commented 2 years ago

the data are now in the following link but you need to confirm you are not a robot to bypass that. How are you going to do that?

https://stooq.com/db/h/

Eeonum commented 2 years ago

also, the ticker part doesn't work anymore. what is the solution for that?

drsxr commented 2 years ago

I am also getting the "BadZipFile" error for d_us_txt.zip. However, it extracted manually just fine using Ubuntus Extract tool.

There is a link that takes you to the english website to download the data. Here it is: https://stooq.com/db/h/. You can manually download it and extract it in the manner intended.

However, the code in the "create_stooq_data.ipynb" notebook under the ### Add symbols heading (found in this github repository under data/create_stooq_data.ipynb) tries to download the file, and add the symbols (tickers and names) but fails because it is not pulling from stooq.

It probably has to be switched to a directory-oriented method - I'm trying to work it out but its going slow. Anyone else know what the fix is to modify the code to traverse the files in the directory and get us so we can store the data in HDF5?

stefan-jansen commented 2 years ago

Please see update from December 2020 after cell 3 that automatic download is not longer supported by stooq and that the fix is manual download instead.

JeremyWhittaker commented 2 years ago

@Eeonum @drsxr

It's not perfect but you can download the zip files manually. I re-wrote some of the code and just tested it tonight.

Copy the data from the links and save it in Notepad as a #.text file and it will work image

Here's the number key - metadata_dict = { ('jp', 'tse etfs'): 34, ('jp', 'tse stocks'): 32, ('us', 'nasdaq etfs'): 69, ('us', 'nasdaq stocks'): 27, ('us', 'nyse etfs'): 70, ('us', 'nyse stocks'): 28, ('us', 'nysemkt stocks'): 26 }

And the code to process it.

for (market, asset_class), code in metadata_dict.items():
    date_file = os.path.join("./stooq", f'{code}.txt')
    df = pd.read_csv(date_file, sep='        ').apply(lambda x: x.str.strip())
    df.columns = ['ticker', 'name']
    df = df.drop_duplicates('ticker').dropna()
    print(market, asset_class, f'# tickers: {df.shape[0]:,.0f}')
    path = stooq_path / 'tickers' / market
    if not path.exists():
        path.mkdir(parents=True)
    df.to_csv(path / f'{asset_class}.csv', index=False)    

File structure should look like this: image

Here is the code to process the zip files.

#need to manually download zip files now 'https://static.stooq.com/db/d/'
import os

def download_price_data(market='us'):
  data_file = f'd_{market}_txt.zip'
  data_file = os.path.join("./stooq", data_file)
  print(data_file)
  if os.path.exists(data_file):
        print('extracting {}'.format(data_file))
        with ZipFile(data_file) as zip_file:
            for i, file in enumerate(zip_file.namelist()):
                if not file.endswith('.txt'):
                    continue
                local_file = stooq_path / file
                print('file: {}'.format(local_file))
                local_file.parent.mkdir(parents=True, exist_ok=True)
                with local_file.open('wb') as output:
                    for line in zip_file.open(file).readlines():
                        output.write(line)
  else:
    print('Could not find file: {}'.format(data_file))
ZhaiCong commented 2 years ago

the prn.us.txt in d_us_txt.zip could not open, manually delete it in d_us_txt.zip

true-delta commented 3 weeks ago

@JeremyWhittaker hi there, i wonder what '34.txt' contains, i cannot run the code correctly because i do not have '34.txt', thank you very much