singer-io / singer-python

Writes the Singer format from Python
https://singer.io
Apache License 2.0
537 stars 129 forks source link

Invalid format string %04Y on Windows #86

Closed MadLittleMods closed 5 years ago

MadLittleMods commented 5 years ago

Invalid format string when using %04Y on Windows 10 Python 3.7.1

$ python
>>> from datetime import datetime
>>> datetime(90, 1, 1).strftime("%04Y")
ValueError: Invalid format string
>>> datetime.strptime("2018-10-31 22:29:29.553000", "%Y-%m-%d %H:%M:%S.%f").strftime("%04Y-%m-%dT%H:%M:%S.%fZ")
ValueError: Invalid format string

Some platforms support modifiers from POSIX 2008 (and others). On Linux the format "%04Y" assures a minimum of four characters and zero-padding. The internal code (as used on Windows and by default on macOS) uses zero-padding by default

https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/strptime#l_sections


Related issues

timvisher commented 5 years ago

I agree that Python 3.7.1's datetime module exhibits this behavior. What's the issue for singer-python?

MadLittleMods commented 5 years ago

@timvisher Isn't this a OS specific problem, not Python version? %04Y is not supported on Windows.

Here is Python 3.5.2 on Windows,

$ "C:\Users\MLM\Downloads\python-3.5.2-embed-amd64\python.exe"
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> datetime(90, 1, 1).strftime("%04Y")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format string

Can we use something like this? Probably the wrong way to patch this 🤷‍♂️

Windows by default zero-pads year values with %Y

singer/utils.py

DATETIME_FMT_SAFE = "%Y-%m-%dT%H:%M:%S.%fZ"

def strftime(dtime, format_str=DATETIME_FMT):
    if dtime.utcoffset() != datetime.timedelta(0):
        raise Exception("datetime must be pegged at UTC tzoneinfo")

    dt_str = None
    try:
        dt_str = dtime.strftime(format_str)
        if dt_str.startswith('4Y'):
            dt_str = dtime.strftime(DATETIME_FMT_SAFE)
    except ValueError:
        dt_str = dtime.strftime(DATETIME_FMT_SAFE)

    return dt_str
timvisher commented 5 years ago

Sorry, I just actually wasn't sure what problem you were seeing.

So is the issue that you're identifying that utils.strftime behaves badly on Windows?

MadLittleMods commented 5 years ago

@timvisher Correct, I can't parse anything because %04Y usage built into format string within this library just throws ValueError: Invalid format string on Windows

timvisher commented 5 years ago

Ah. Got it. :)

So we'd definitely accept patches to improve Windows support but I'll let you know up front that the testing burden might be a little high, since none of us have easy access to Windows machines at the moment nor any expertise on developing/debugging Python on Windows issues.

I guess at a minimum we'll want to see some unit tests added if they don't already exist that test that 3 digit years are zero padded without %04D on Windows and that they continue to be padded on Mac/Linux.

In terms of the general design I don't think duck typing is necessarily the wrong way to go here. First try with the Mac/Linux format string, fall back to the windows format string on an exception, otherwise raise.

Sound good?

MadLittleMods commented 5 years ago

@timvisher I created https://github.com/singer-io/singer-python/pull/87

We already have tests for zero-padding a small year. The tests failed before and now pass on Windows.

timvisher commented 5 years ago

Pushed 5.3.3 to PyPI. Thanks! :)

nathanfreystaetter commented 8 months ago

This is still an issue on my Windows machine. I fixed by updating DATETIME_FMT to be the same as DATETIME_FMT_MAC

DATETIME_FMT = "%Y-%m-%dT%H:%M:%S.%fZ"