Open rocodero opened 3 months ago
take
Hi @rocodero
You can use the below code to format comma between seconds and milliseconds.
By default Python ISO 8601 Expects a .
But as you have a ,
you can parse the datetime
from datetime import datetime
# Sample datetime string
datetime_str = "2024-06-17T18:57:43,567"
# Define the format
datetime_format = "%Y-%m-%dT%H:%M:%S,%f"
# Parse the datetime string
parsed_datetime = datetime.strptime(datetime_str, datetime_format)
You can also follow this link
And use their suggested similar alternative:
date_string.replace(',', '.'))
Hi @Anurag-Varma,
thank you for the suggestions, plenty of options there, I know. I was trying to focus on the bug itself and keep it clean, but if workarounds are appreciated I can add them in the future.
What I forgot to add, though: It's only present since pandas 2.2. Version 2.1.4 parses the comma-separated milliseconds without problems.
thanks @rocodero for the report
aside from whether it should parse or not, it looks pretty wild to me that
In [6]: t
Out[6]: Timestamp('2024-06-17 18:57:43.567000')
In [7]: t.unit
Out[7]: 's'
In [8]: t.microsecond
Out[8]: 567000
it records non-zero microseconds but has 's' unit
looks like it's going down the dateutil path
In [9]: pd.to_datetime(['2024-06-17T18:57:43,567']*2)
<ipython-input-9-ad2044f68d63>:1: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(['2024-06-17T18:57:43,567']*2)
Out[9]: DatetimeIndex(['2024-06-17 18:57:43', '2024-06-17 18:57:43'], dtype='datetime64[ns]', freq=None)
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
When parsing a timestamp string that separates the milliseconds with a comma instead of a dot, the timestamp representation shows microseconds, but the actual stored resolution / unit that is used for any calculations with the timestamp is only seconds.
The problem does not occur with more then three digits after the comma or when using a dot as a separator.
Expected Behavior
Timestamp resolution is properly parsed to the millisecond
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.12.4.final.0 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.22621 machine : AMD64 processor : Intel64 Family 6 Model 141 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252
pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 69.5.1 pip : 24.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 5.2.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : 8.25.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : 1.3.7 dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.4 numba : None numexpr : 2.8.7 odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None