stefan-jansen / machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.
https://ml4trading.io
12.57k stars 4.03k forks source link

Cannot create stooq data #283

Closed doruirimescu closed 1 year ago

doruirimescu commented 1 year ago

Describe the bug In create_stooq_data.ipynb Cannot Add symbols. This line df = pd.read_csv(f'https://stooq.com/db/l/?g={code}', sep=' ').apply(lambda x: x.str.strip()) raises an error EmptyDataError: No columns to parse from file Basically, it cannot properly parse the data from the url into a csv.

Screenshots Selection_169 Selection_170

Environment If you are not using the latest version of the Docker imag:

Additional context Add any other context about the problem here.

2f2a commented 1 year ago

I have the same problem, bad source (polish stooq).

tmontana commented 1 year ago

I just spent a few hours trying to manual download and re-create the files to run the notebooks (especially create data and then for RL)... To no avail. It's sad that there is no effort made to solve this issue when the book is still being sold and marketed actively.

stefan-jansen commented 1 year ago

@tmontana @2f2a @doruirimescu If you take a closer look at the create_stooq_data notebook, you should notice my warning added 12/20 that stooq has disabled automatic downloads.

@tmontana There is not much I can do about this. You have to download the files manually. I just checked, and when you download the US data from here, the data has the same directory layout and content (except for new/removed assets) after unzipping and moving into d_us_txt/data as the previous automatic download.

My assumption here is that by pointing to the manual download that retrieves the same data, readers would be able to use the files. Could you please help me understand what effort you expect on my part to support you in the process?

tmontana commented 1 year ago

Thanks for the response. I did follow your instructions of 12/20 and downloaded the data manually but was never able to put it in the right shape. I think the instructions could be a lot clearer. I'll try and redownload and describe precise errors I'm getting.

tmontana commented 1 year ago

All I wanted is to run the DRL notebook. That's when I realized I needed to first run the build data one and then run the build stooq one which doesn't work. I don't even care about the data - which is massive. I just want to understand the code by running it. Problem below...

In the stooq notebook it says to do something - download but it doesn't say what and it doesn't say where to put it. Your comment above is clearer than what's in the notebook (see screenshot) but the link you have above fails (with error message - saying unauthorized). You need to spell out what we need to download as opposed to the link.

image

In the notebook is the section under "Download Price Data" still part of the 2020 update? The instructions there are not the same as putting the file into d_us_txt/data (as per automatic download?). It's as if you're assuming we have the automatic download and thus know where the file goes (and which file it is).

Assuming you manage to figure out which file to download and place it in the right place it still doesn't work and is missing a bunch of other files, namely: 'stooq/tickers/us/nasdaq etfs.csv' Where is that supposed to come from?

I created that one manually from the other files. Then it started asking for another etc.. until I gave up.

stefan-jansen commented 1 year ago

The Download Price Data is not part of the 2020 update, because since 2020, stooq no longer permits automatic updates.

If you take a look at the following path definition and function, you may note that it contains a dataurl that is rather similar to the name of the file we would manually download: `f'd{market}_txt.zip', where market is, per default, us. It then writes the entire content to the previously defined stooq_path. This is of course also describe under:

2. Store the result under stooq using the preferred folder structure outlined on the website. It has the structure: /data/freq/market/asset_class, such as /data/daily/us/nasdaq etfs.
stooq_path = Path('stooq') 
if not stooq_path.exists():
    stooq_path.mkdir()

def download_price_data(market='us'):
    data_url = f'd_{market}_txt.zip'
    response = requests.get(STOOQ_URL + data_url).content
    with ZipFile(BytesIO(response)) as zip_file:
        for i, file in enumerate(zip_file.namelist()):
            if not file.endswith('.txt'):
                continue
            local_file = stooq_path / file
            local_file.parent.mkdir(parents=True, exist_ok=True)
            with local_file.open('wb') as output:
                for line in zip_file.open(file).readlines():
                    output.write(line)

Next, the notebook suggests to download the symbols. I don't think much has changed here, so let's walk through this:

Add the corresponding symbols, i.e., tickers and names by following the directory tree on the same site. You can also adapt the following code snippet using the appropriate asset code that you find by inspecting the url; this example works for NASDAQ ETFs that have code g=69:```

If you return to the [historical data site](https://stooq.com/db/h/), you should see the directory tree and confirm that the URL for NASDAQ ETFs reads like in the following download instructions. Now, the downloads no longer work, but the link does. So we go to the page, copy the data and paste them into a local file, say `symbolx.txt` and load that instead of the URL. 

df = pd.read_csv('https://stooq.com/db/l/?g=69', sep=' ').apply(lambda x: x.str.strip()) df.columns = ['ticker', 'name'] df.drop_duplicates('ticker').to_csv('stooq/data/tickers/us/nasdaq etfs.csv', index=False)



Hope this helps. 
2f2a commented 1 year ago

"Stooq.com" have "captcha" -lock, so csv -files cant be downloaded from a script. But once 1 captcha is done (pick any at https://stooq.com/db/), then https://stooq.com/db/l/?g=69 etc, are valid, -by hand in same browser.

I saved them as "ptemp/34.cvs" .. and then:

df = pd.read_csv(f'https://stooq.com/db/l/?g={code}', sep=' ').apply(lambda x: x.str.strip())

df = pd.read_csv(f'ptemp/{code}.csv', sep='        ').apply(lambda x: x.str.strip())

As an alternative Python -module "yfinance", seams more reliable, and have Swedish tickers..