ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
14.93k stars 2.45k forks source link

yFinance download API works differently on Linux vs Windows system #2014

Open mihir-sampat-adaptive opened 3 months ago

mihir-sampat-adaptive commented 3 months ago

Describe bug

Issue Description

I encountered an issue where the yf.download API behaves differently on Linux and Windows systems. Specifically, when using the yf.download API with the period set to max, the API returns a proper DataFrame with the expected output on Windows without any errors, as shown below:

[*********************100%**********************]  1 of 1 completed
                    Open          High           Low         Close     Adj Close    Volume
Date
2024-08-05    174.445999    181.694901    174.353607    178.951599    178.951599    0

The response is similar even with a random start date.

However, when I run the exact same code on an Amazon EC2 Linux instance, the download API throws the following error:

YFInvalidPeriodError("%ticker%: Period 'max' is invalid, must be one of ['1d', '5d']")

This discrepancy suggests that the yf.download API behaves differently on Linux compared to Windows.

Steps to Reproduce

Example ticker: ^XND

Expected Behavior

The yf.download API should return a DataFrame with the expected data without throwing errors, regardless of the operating system.

Actual Behavior

Environment Details

Additional Information

The issue persists even when a random start date is provided.

This behavior suggests a potential discrepancy in the yFinance API implementation or configuration for different operating systems.

Request for Insight

Any insight into why this discrepancy occurs and how to resolve it would be very helpful. Is there a known issue with yFinance on Linux systems, or is there a workaround to make the behavior consistent across different operating systems?

Thank you for your assistance.

Simple code that reproduces your problem

Code

from yfinance import download

download(tickers=['^XND'], period='max')

Result on Windows

[*********************100%**********************]  1 of 1 completed
                    Open          High           Low         Close     Adj Close    Volume
Date
2024-08-05    174.445999    181.694901    174.353607    178.951599    178.951599    0

Result on Linux

[*********************100%%**********************]  1 of 1 completed

1 Failed download:
['^XND']: YFInvalidPeriodError("%ticker%: Period 'max' is invalid, must be one of ['1d', '5d']")
Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []

But when this same query was implemented using a 1d or 5d period it worked as expected.

Debug log

[*********************100%%**********************]  1 of 1 completed

1 Failed download:
['^XND']: YFInvalidPeriodError("%ticker%: Period 'max' is invalid, must be one of ['1d', '5d']")
Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []

Bad data proof

No response

yfinance version

0.2.41

Python version

3.11

Operating system

Windows 11, Amazon Linux 2023

ValueRaider commented 3 months ago

That's not the debug log.

Inder782 commented 3 months ago

One thing you can do is try running the code in wsl ( windows subsystem for Linux) , and see if the error is still there.

cgmike commented 3 months ago

I can confirm that the error also occurs on windows subsystem for Linux (Windows 10)

WoxxyG commented 3 months ago

This has to do with pytz not being able to handle year data past year 2038. When you use max it adds 99 years to the current date, which goes past year 2048. https://github.com/stub42/pytz/issues/31

if start or period is None or period.lower() == "max":
    # Check can get TZ. Fail => probably delisted
    tz = self.tz
    if tz is None:
        # Every valid ticker has a timezone. A missing timezone is a problem.
        _exception = YFTzMissingError(self.ticker)
        err_msg = str(_exception)
        shared._DFS[self.ticker] = utils.empty_df()
        shared._ERRORS[self.ticker] = err_msg.split(': ', 1)[1]
        if raise_errors:
            raise _exception
        else:
            logger.error(err_msg)
        return utils.empty_df()
    if end is None:
        end = int(_time.time())
    else:
        end = utils._parse_user_dt(end, tz)
    if start is None:
        if interval == "1m":
            start = end - 604800   # 7 days
        elif interval in ("5m", "15m", "30m", "90m"):
            start = end - 5184000  # 60 days
        elif interval in ("1h", '60m'):
            start = end - 63072000  # 730 days
        else:
            start = end - 3122064000  # 99 years
    else:
        start = utils._parse_user_dt(start, tz)
    params = {"period1": start, "period2": end}
else:
    period = period.lower()
    params = {"range": period}
ww-hub-user commented 2 months ago

I found that when running the following code on EC2: `stockNames = ['A', 'AAA', 'AAPL', 'NVDA', 'CNQ', 'SNA', 'META'] for stockName in stockNames: Ticker = yf.Ticker(stockName)

Get dividend and split information

actions_data = Ticker.actions`

I only get data from 2022 and earlier, and cannot obtain the latest data. However, the code works fine and retrieves the latest data when run on a local Windows machine.