Hello.
I have discovered a performance degradation in the .to_datetime function of pandas version 2.0.3. .to_datetime doesn't recognize Arrow date time dtypes and converts them again. And I noticed that some parts of the repository depend on the pandas version 2.0.3. I found that many files such as stock_analysis/sp500_cot_sentiment_analysis.py, technical_indicators/candle_abs_returns.py used the influenced api. There may be more files using the influenced api. I am not sure whether this performance problem in pandas will affect this repository. Here are some discussions on pandas GitHub related to this issue, including #52545 and #53301.
Reproducible Example in pandas
In [3]: dr = pd.date_range("2019-12-31", periods=1_000_000, freq="s").astype(pd.ArrowDtype(pa.timestamp(unit="ns")))
In [4]: %timeit pd.to_datetime(dr)
1.84 s ± 8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Suggestion
I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance.
Any other workarounds or solutions would be greatly appreciated.
Thank you!
Issue Description:
Hello. I have discovered a performance degradation in the
.to_datetime
function of pandas version 2.0.3..to_datetime
doesn't recognize Arrow date time dtypes and converts them again. And I noticed that some parts of the repository depend on the pandas version 2.0.3. I found that many files such asstock_analysis/sp500_cot_sentiment_analysis.py
,technical_indicators/candle_abs_returns.py
used the influenced api. There may be more files using the influenced api. I am not sure whether this performance problem in pandas will affect this repository. Here are some discussions on pandas GitHub related to this issue, including #52545 and #53301.Reproducible Example in pandas
Suggestion
I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance. Any other workarounds or solutions would be greatly appreciated. Thank you!