rstudio / reticulate

R Interface to Python
https://rstudio.github.io/reticulate
Apache License 2.0
1.67k stars 327 forks source link

Problem using Python's datetime with reticulate #1005

Open joel314 opened 3 years ago

joel314 commented 3 years ago

Hi,

I am using rmarkdown with reticulate. In a Python block, I try to parse a date using Python's datetime library like this:

    ---
    title: "Example"
    author: "Test"
    date: "5/15/2021"
    output: html_document
    ---

    ```{R include=FALSE}
    library(reticulate)
    use_condaenv("myenv")
```{python}
from datetime import datetime
dt = datetime.strptime('Jun 15 2021 10:30 AM', '%b %d %Y %I:%M %p')
dt
```

When I run the Python code in a Python console it works fine - but it fails in a `reticulate::repl_python()` session. The code works if I change it to `dt = datetime.strptime('Jun 15 2021 10:30', '%b %d %Y %I:%M')`.

I checked that the version of Python and of `datetime` used in Reticulate is the same as in the Python console. I have many other Python blocks that are working fine.

Am I doing something wrong here? I noticed in my `sessioninfo` below that there is a message about `timedatectl`. Could it be related to my issue? 

Thanks for your help!
Regards.
Joël.

_____________

sessioninfo::session_info() timedatectl: /home/joel/anaconda3/envs/myenv/lib/libuuid.so.1: no version information available (required by /lib/x86_64-linux-gnu/libcryptsetup.so.12) ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value
version R version 4.0.2 (2020-06-22) os Ubuntu 20.04.2 LTS
system x86_64, linux-gnu
ui RStudio
language en_US
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Copenhagen
date 2021-05-15

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package * version date lib source
cli 2.4.0 2021-04-05 [1] CRAN (R 4.0.2) glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2) lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.2) Matrix 1.3-2 2021-01-06 [1] CRAN (R 4.0.2) Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2) reticulate 1.18 2020-10-25 [1] CRAN (R 4.0.2) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.2)

[1] /home/joel/R/x86_64-pc-linux-gnu-library/4.0 [2] /usr/local/lib/R/site-library [3] /usr/lib/R/site-library [4] /usr/lib/R/library

kevinushey commented 3 years ago

When I run the Python code in a Python console it works fine - but it fails in a reticulate::repl_python() session.

Can you elaborate? Are you seeing an error, or something else? If you're seeing an error, what is the error message?

joel314 commented 3 years ago

Hi Kevin,

Thanks for your reply. Sure, the error message is ValueError: unconverted data remains: AM. Here is an example:

> reticulate::repl_python()
Python 3.8.8 (/home/joel/anaconda3/envs/myenv/bin/python)
Reticulate 1.18 REPL -- A Python interpreter in R.
>>> from datetime import datetime
>>> dt = datetime.strptime('Jun 15 2021 10:30 AM', '%b %d %Y %I:%M %p')
ValueError: unconverted data remains: AM

The same code on a Python console gives:

from datetime import datetime
dt = datetime.strptime('Jun 15 2021 10:30 AM', '%b %d %Y %I:%M %p')
dt
Out[2]: datetime.datetime(2021, 6, 15, 10, 30)

Best Regards, Joël.

joel314 commented 3 years ago

I tried to run the opposite operation and I noticed the following:

> reticulate::repl_python()
>>> from datetime import datetime
>>> dt = datetime.strptime('Jun 15 2021 22:30', '%b %d %Y %H:%M')
>>> datetime.strftime(dt, '%Y-%m-%d %I:%M %p')
'2021-06-15 10:30 '
>>> 

The "PM" is missing from the output with Reticulate. In a Python console, it gives:

from datetime import datetime
dt = datetime.strptime('Jun 15 2021 22:30', '%b %d %Y %H:%M')
datetime.strftime(dt, '%Y-%m-%d %I:%M %p')
Out[4]: '2021-06-15 10:30 PM'

Regards, Joël.

kevinushey commented 3 years ago

Sorry, I don't have a good idea :-/ The only similar issue I can find is https://stackoverflow.com/questions/37173883/importing-matplotlib-pyplot-makes-datetime-fail-valueerror-unconverted-data-re.

eddelbuettel commented 3 years ago

It's a weird topic. timedatectl is quite new-ish, and I had issues in some R contexts (on Docker, say) where it may have been missing. The other is the parsing of timestrings. The session info show en_US as a locale even with Copenhagen as the TZ base. One wild guess (which could be off) is that the locale or TZ setting may possibly impede parsing of crazy North American 12 hour time with AP/PM suffixes? If you can, just stick to sane ISO 8601 date and time and a 24 clock (which around here is sometimes called Military time).

krihabu commented 1 year ago

Hi, I have the same problem using reticulate 1.30 in RMarkdown (with Python 3.11.3 and Pandas 2.0.2).

This works in Python console but not with reticulate: pd.to_datetime("11/23/2010 6:00 PM", format="%m/%d/%Y %I:%M %p")

Error message is ValueError: unconverted data remains when parsing with format "%m/%d/%Y %I:%M %p": "PM", at position 0..

Any new thoughts on this?

Edit: Added my session info

─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  German_Germany.utf8
 ctype    German_Germany.utf8
 tz       Europe/Berlin
 date     2023-08-30
 rstudio  2023.06.0+421 Mountain Hydrangea (desktop)
 pandoc   NA

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package       * version    date (UTC) lib source
 cli             3.6.1      2023-03-23 [1] CRAN (R 4.3.0)
 colorspace      2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 dplyr         * 1.1.2      2023-04-20 [1] CRAN (R 4.3.0)
 fansi           1.0.4      2023-01-22 [1] CRAN (R 4.3.0)
 generics        0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2       * 3.4.2      2023-04-03 [1] CRAN (R 4.3.0)
 ggridges      * 0.5.4      2022-09-26 [1] CRAN (R 4.3.1)
 glue          * 1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 gtable          0.3.3      2023-03-21 [1] CRAN (R 4.3.0)
 here            1.0.1      2020-12-13 [1] CRAN (R 4.3.0)
 jsonlite        1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
 knitr           1.43       2023-05-25 [1] CRAN (R 4.3.0)
 lattice         0.21-8     2023-04-05 [1] CRAN (R 4.3.1)
 lifecycle       1.0.3      2022-10-07 [1] CRAN (R 4.3.0)
 lubridate     * 1.9.2      2023-02-10 [1] CRAN (R 4.3.0)
 magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 Matrix          1.5-4.1    2023-05-18 [1] CRAN (R 4.3.1)
 munsell         0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 pillar          1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 png             0.1-8      2022-11-29 [1] CRAN (R 4.3.0)
 R6              2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp            1.0.10     2023-01-22 [1] CRAN (R 4.3.0)
 reticulate    * 1.30       2023-06-09 [1] CRAN (R 4.3.1)
 rjson         * 0.2.21     2022-01-09 [1] CRAN (R 4.3.0)
 rlang           1.1.1      2023-04-28 [1] CRAN (R 4.3.0)
 rprojroot       2.0.3      2022-04-02 [1] CRAN (R 4.3.0)
 rstudioapi      0.14       2022-08-22 [1] CRAN (R 4.3.0)
 scales          1.2.1      2022-08-20 [1] CRAN (R 4.3.0)
 sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.3.1)
 tibble          3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyselect      1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 timechange      0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
 utf8            1.2.3      2023-01-31 [1] CRAN (R 4.3.0)
 vctrs           0.6.3      2023-06-14 [1] CRAN (R 4.3.1)
 withr           2.5.0      2022-03-03 [1] CRAN (R 4.3.0)
 xfun            0.39       2023-04-20 [1] CRAN (R 4.3.0)

 [1] C:/Users/krihabu/AppData/Local/Programs/R/R-4.3.1/library

─ Python configuration ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 python:         C:/Users/krihabu/anaconda3/envs/trwork/python.exe
 libpython:      C:/Users/krihabu/anaconda3/envs/trwork/python311.dll
 pythonhome:     C:/Users/krihabu/anaconda3/envs/trwork
 version:        3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:50:54) [MSC v.1934 64 bit (AMD64)]
 Architecture:   64bit
 numpy:          C:/Users/krihabu/anaconda3/envs/trwork/Lib/site-packages/numpy
 numpy_version:  1.24.3

 NOTE: Python version was forced by use_python function
t-kalinowski commented 1 year ago

Hi @krihabu, I can't reproduce locally: image

Reading over the thread, this strikes me as something that's likely specific to conda. If you switch over to using a virtual environment, do you still see the error?

library(reticulate)
install_python("3.11:latest")
virtualenv_create(envname = "r-reticulate", version = "3.11", 
                  packages = c("pandas", "numpy"), force = TRUE)
# restart R session
use_virtualenv("r-reticulate")
repl_python()
import pandas as pd; 
pd.to_datetime("11/23/2010 6:00 PM", format="%m/%d/%Y %I:%M %p")
krihabu commented 1 year ago

Thanks @t-kalinowski,

I just tried your suggestion, but I unfortunately still get the same error:

> repl_python()
Python 3.11.5 (C:/Users/krihabu/Documents/.virtualenvs/r-reticulate/Scripts/python.exe)
Reticulate 1.30 REPL -- A Python interpreter in R.
Enter 'exit' or 'quit' to exit the REPL and return to R.
>>> import pandas as pd
>>> pd.to_datetime("11/23/2010 6:00 PM", format="%m/%d/%Y %I:%M %p")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\KRIHAB~1\DOCUME~1\VIRTUA~1\R-RETI~1\Lib\site-packages\pandas\core\tools\datetimes.py", line 1146, in to_datetime
    result = convert_listlike(np.array([arg]), format)[0]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\KRIHAB~1\DOCUME~1\VIRTUA~1\R-RETI~1\Lib\site-packages\pandas\core\tools\datetimes.py", line 488, in _convert_listlike_datetimes
    return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\KRIHAB~1\DOCUME~1\VIRTUA~1\R-RETI~1\Lib\site-packages\pandas\core\tools\datetimes.py", line 519, in _array_strptime_with_fallback
    result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "strptime.pyx", line 534, in pandas._libs.tslibs.strptime.array_strptime
  File "strptime.pyx", line 359, in pandas._libs.tslibs.strptime.array_strptime
ValueError: unconverted data remains when parsing with format "%m/%d/%Y %I:%M %p": "PM", at position 0. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
t-kalinowski commented 1 year ago

Hmm, I thought it might be a Windows issue, but was unable to reproduce on Windows either.

(Looking over the error message, I see that you're running an outdated version of reticulate. I don't think updating to the latest release will fix the issue, but just so we're working off the same version, can you please run remotes::install_github("rstudio/reticulate"); reticulate:::rm_all_reticulate_state(), and then rerun the venv setup snippet from my previous comment?)

krihabu commented 1 year ago

Done, I'm now also on Reticulate 1.31.0.9000 and the error persists...

Is there any other information I can provide that might be useful?

t-kalinowski commented 1 year ago

What is the new output of sessioninfo::session_info() after all the updates + venv creation?

krihabu commented 1 year ago

Current session info is

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  German_Germany.utf8
 ctype    German_Germany.utf8
 tz       Europe/Berlin
 date     2023-09-01
 rstudio  2023.06.0+421 Mountain Hydrangea (desktop)
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version     date (UTC) lib source
 cli           3.6.1       2023-03-23 [1] CRAN (R 4.3.0)
 dbscan        1.1-11      2022-10-27 [1] CRAN (R 4.3.1)
 digest        0.6.32      2023-06-26 [1] CRAN (R 4.3.1)
 evaluate      0.21        2023-05-05 [1] CRAN (R 4.3.0)
 fansi         1.0.4       2023-01-22 [1] CRAN (R 4.3.0)
 fastmap       1.1.1       2023-02-24 [1] CRAN (R 4.3.0)
 glue          1.6.2       2022-02-24 [1] CRAN (R 4.3.0)
 htmltools     0.5.5       2023-03-23 [1] CRAN (R 4.3.0)
 jsonlite      1.8.7       2023-06-29 [1] CRAN (R 4.3.1)
 knitr         1.43        2023-05-25 [1] CRAN (R 4.3.0)
 lattice       0.21-8      2023-04-05 [1] CRAN (R 4.3.1)
 lifecycle     1.0.3       2022-10-07 [1] CRAN (R 4.3.0)
 Matrix        1.5-4.1     2023-05-18 [1] CRAN (R 4.3.1)
 pillar        1.9.0       2023-03-22 [1] CRAN (R 4.3.0)
 png           0.1-8       2022-11-29 [1] CRAN (R 4.3.0)
 Rcpp          1.0.10      2023-01-22 [1] CRAN (R 4.3.0)
 reticulate  * 1.31.0.9000 2023-08-31 [1] Github (rstudio/reticulate@28b3b9a)
 rlang         1.1.1       2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown     2.22        2023-06-01 [1] CRAN (R 4.3.0)
 rstudioapi    0.14        2022-08-22 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2       2021-12-06 [1] CRAN (R 4.3.1)
 utf8          1.2.3       2023-01-31 [1] CRAN (R 4.3.0)
 vctrs         0.6.3       2023-06-14 [1] CRAN (R 4.3.1)
 withr         2.5.0       2022-03-03 [1] CRAN (R 4.3.0)
 xfun          0.39        2023-04-20 [1] CRAN (R 4.3.0)
 yaml          2.3.7       2023-01-23 [1] CRAN (R 4.3.0)

 [1] C:/Users/krihabu/AppData/Local/Programs/R/R-4.3.1/library

─ Python configuration ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 python:         C:/Users/krihabu/Documents/.virtualenvs/r-reticulate/Scripts/python.exe
 libpython:      C:/Users/krihabu/AppData/Local/r-reticulate/r-reticulate/pyenv/pyenv-win/versions/3.11.5/python311.dll
 pythonhome:     C:/Users/krihabu/Documents/.virtualenvs/r-reticulate
 version:        3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)]
 Architecture:   64bit
 numpy:          C:/Users/krihabu/Documents/.virtualenvs/r-reticulate/Lib/site-packages/numpy
 numpy_version:  1.25.2

 NOTE: Python version was forced by use_python() function

And these are the packages in r-reticulate

> py_list_packages()
          package version            requirement
1           numpy  1.25.2          numpy==1.25.2
2          pandas   2.1.0          pandas==2.1.0
3 python-dateutil   2.8.2 python-dateutil==2.8.2
4            pytz  2023.3           pytz==2023.3
5             six  1.16.0            six==1.16.0
6          tzdata  2023.3         tzdata==2023.3
tovogt commented 1 year ago

I would guess that this is an issue with locales since the %p directive depends on the locale. For example, when using de_DE, it is expected to be an empty string, but for en_US or en_GB it is expected to be one of AM or PM.

This has nothing to do with pandas. You can also reproduce the issue with the built-in datetime package. The locale that needs to be set is LC_TIME. Use the python package locale to control the currently active locale settings:

>>> import datetime as dt
>>> import locale
>>> locale.getlocale(locale.LC_TIME)
('en_US', 'UTF-8')
>>> dt.datetime.strptime("11/23/2010 6:00 PM", "%m/%d/%Y %I:%M %p")
datetime.datetime(2010, 11, 23, 18, 0)
>>> locale.setlocale(locale.LC_TIME, ('de_DE', 'UTF-8'))
>>> dt.datetime.strptime("11/23/2010 6:00 PM", "%m/%d/%Y %I:%M %p")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tovogt/.local/share/miniconda3/envs/r_test/lib/python3.11/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tovogt/.local/share/miniconda3/envs/r_test/lib/python3.11/_strptime.py", line 352, in _strptime
    raise ValueError("unconverted data remains: %s" %
ValueError: unconverted data remains: PM

So, my conclusion would be that reticulate sets different locales than is done by default when using Python. For example, on my setup with Ubuntu 22.04, the shell uses a different value for LC_TIME than is used by Python even if it is invoked within that same shell:

$ locale | grep LC_TIME
LC_TIME=de_DE.UTF-8
$ python
>>> import locale
>>> locale.getlocale(locale.LC_TIME)
(None, None)
>>> # (None, None) means that it falls back to the default locale:
>>> locale.getdefaultlocale()
('en_US', 'UTF-8')
>>> exit()
$ date
Do 7. Sep 09:55:57 CEST 2023
$ LC_TIME="" date
Thu Sep  7 09:56:31 AM CEST 2023

As a workaround, just make sure to call locale.setlocale(locale.LC_TIME, ...) before doing any date string conversions. You can also define a new context manager for that.

krihabu commented 1 year ago

Thanks @tovogt!

Indeed, setting the locale beforehand fixes the problem:

> library(reticulate)
> repl_python()
Python 3.11.3 (C:/Users/krihabu/anaconda3/envs/trwork/python.exe)
Reticulate 1.31.0.9000 REPL -- A Python interpreter in R.
Enter 'exit' or 'quit' to exit the REPL and return to R.
>>> import pandas as pd
>>> pd.to_datetime("11/23/2010 6:00 PM", format="%m/%d/%Y %I:%M %p")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\KRIHAB~1\ANACON~1\envs\trwork\Lib\site-packages\pandas\core\tools\datetimes.py", line 1084, in to_datetime
    result = convert_listlike(np.array([arg]), format)[0]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\KRIHAB~1\ANACON~1\envs\trwork\Lib\site-packages\pandas\core\tools\datetimes.py", line 453, in _convert_listlike_datetimes
    return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\KRIHAB~1\ANACON~1\envs\trwork\Lib\site-packages\pandas\core\tools\datetimes.py", line 484, in _array_strptime_with_fallback
    result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas\_libs\tslibs\strptime.pyx", line 530, in pandas._libs.tslibs.strptime.array_strptime
  File "pandas\_libs\tslibs\strptime.pyx", line 355, in pandas._libs.tslibs.strptime.array_strptime
ValueError: unconverted data remains when parsing with format "%m/%d/%Y %I:%M %p": "PM", at position 0. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
>>> import locale
>>> locale.setlocale(locale.LC_TIME, ('en_US', 'UTF-8'))
'en_US.UTF-8'
>>> pd.to_datetime("11/23/2010 6:00 PM", format="%m/%d/%Y %I:%M %p")
Timestamp('2010-11-23 18:00:00')
t-kalinowski commented 1 year ago

Thank you @tovogt for the excellent diagnosis and fix!