ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
14.89k stars 2.44k forks source link

How to remove time from date column in Dividend data? #2074

Closed apparition-stack closed 1 month ago

apparition-stack commented 1 month ago

Hey there, hoping I'm doing something obviously wrong that someone can help me identify.

I download multiple ticker stock and dividend data in a for loop, but I only care about the date column having a "YYYY-MM-DD" value not the full "YYYY-MM-DD 00:00:00-05:00" that includes time and timezone.

I use this code to download each ticker dividend data and save to CSV:

tkDiv = yf.Ticker(tk).dividends
tkDiv.to_csv(pathDividends + tk +'_DIV.csv', encoding='utf-8')

But the output is something like: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Date | Dividends -- | -- 2017-12-26 00:00:00-05:00 | 0.189 2018-12-26 00:00:00-05:00 | 1.123 2019-12-27 00:00:00-05:00 | 0.457 2020-12-17 00:00:00-05:00 | 1.512 2021-12-16 00:00:00-05:00 | 3.483 2022-12-15 00:00:00-05:00 | 0.196 2023-12-14 00:00:00-05:00 | 0.255 2023-12-28 00:00:00-05:00 | 0.042

I managed to solve my problem for the ticker close data where my code is:

tkData = yf.download(tickers = tk, period = "max", interval = "1d")
tkData.reset_index(inplace=True) # remove time from data column
tkData['Date'] = tkData['Date'].dt.normalize()
tkData['Date'] = tkData['Date'].dt.tz_localize(None)
tkData.set_index('Date', inplace=True)
tkData.to_csv(pathTickers + tk + '.csv', encoding='utf-8')

However, the same didn't work for dividend data.

Any ideas?

ValueRaider commented 1 month ago

This really a Pandas question

apparition-stack commented 1 month ago

Is it a Pandas question? I should have added that without changing anything in my python code, sometimes the dividend data is as desired, and other times it has the time and timezone. I cannot tell why the difference at random intervals. But my understanding is pandas would just take in any data. I want to better control the data going into pandas in the first place which is coming from yfinance…

ValueRaider commented 1 month ago

https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.html