python / cpython

The Python programming language
https://www.python.org
Other
61.86k stars 29.76k forks source link

Add ISO Basic format support to datetime.isoformat() and date.isoformat() #118948

Open mohd-akram opened 3 months ago

mohd-akram commented 3 months ago

Feature or enhancement

Proposal:

In additional to the popular ISO 8601 Extended format, there's also an ISO 8601 Basic format for datetimes which is useful for filenames and URL components as it avoids characters such as eg. colon and is more compact. datetime.fromisoformat already supports parsing this format.

Example code:

datetime.isoformat(basic=True)
# 20240422T204705.335-0400
date.isoformat(basic=True)
# 20240422

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

nineteendo commented 3 months ago

Does this need dicussion on Discourse, or is the issue minor enough?

vstinner commented 3 months ago

I think that it's a good idea to support formatting in the basic format (that I just discovered).

cc @pganssle @abalkin

serhiy-storchaka commented 2 months ago

Should it be basic with default value False or extended with default value True?

datetime.isoformat() has parameter sep which specifies the separator between date and time. Taking it as a precedence, we can add similar parameters for separators between components in a date and a time. sep currently can only be a character, it should support also an empty string.

On other hand, adding parameters to .isoformat() is not the only way to solve this problem. You can also use.strftime() or str.replace().

pganssle commented 2 months ago

I am maybe -0.5 on this feature. There is a case for putting stuff in isoformat if people are usually going to want automatic truncation, but the in the use cases put forward like filenames, you would almost certainly prefer a fixed format, so strftime(..., "%Y%M%DT%h%m%s.ext") seems like it would actually be better than this.

Taking it as a precedence, we can add similar parameters for separators between components in a date and a time. sep currently can only be a character, it should support also an empty string

We should definitely not do this, because ISO8601 makes no provision for arbitrary separators, and to the extent that sep is even allowed to be something other than T, I'm fairly confident that you are not allowed to omit it entirely.

picnixz commented 2 months ago

First of all, note that my comments are based on ISO 8601:2004 which is superseeded by 8601:2019, which I need to buy (but I won't). I nevertheless assume that the informative parts remain the same (namely sections 1 and 2).

Should it be basic with default value False or extended with default value True?

ISO 8601:2004 section 2.3.3 says The basic format should be avoided in plain text. For years, isoformat() assumed the extended format and thus, having a flag for explicitly enabling the basic format is preferrable (basic=True disables the extended format and explicitly switches to a basic format). With extended=False, we implicitly switches to the basic format by disabling the extended one.

so strftime(..., "%Y%M%DT%h%m%s.ext") seems like it would actually be better than this.

In this case, I would agree but this is not exactly the same as having the basic format as specified by ISO 8601. Now, while I did suggest a PR for the basic format (and would be happy it was accepted), I'm actually wondering it is really needed in the end. For instance, the date command does not propose to output the basic format by default but allows to input it, so it could also make sense that we do not want to do it either (you can still output a basic format but you need to make it yourself, e.g., date +'%H%M%S').

vstinner commented 2 months ago

@pganssle:

There is a case for putting stuff in isoformat if people are usually going to want automatic truncation

What do you mean by automatic truncation? The idea is to add an opt-in format basic=True, by default nothing is changed. Did I miss something?

pganssle commented 2 months ago

What do you mean by automatic truncation?

When timespec is set to auto (the default), if a datetime doesn't have sub-second components, they will be excluded from the output; this, and the difference in how time zones are handled, are some of the main reasons why isoformat isn't just syntactic sugar for some strftime format:

>>> dts = [datetime(2024, 3, 7, 12, 15, 30, 123456),
           datetime(2024, 4, 9, 13),
           datetime(2024, 5, 1, 16, 30, 2, 456123, tzinfo=timezone(timedelta(hours=5))),
           datetime(2024, 6, 1, 16, 15, tzinfo=timezone(timedelta(hours=5, minutes=3, seconds=14)))]
>>> for dt in dts:
...     print(dt.isoformat())
... 
2024-03-07T12:15:30.123456
2024-04-09T13:00:00
2024-05-01T16:30:02.456123+05:00
2024-06-01T16:15:00+05:03:14
>>> for dt in dts:
...     print(dt.strftime("%Y-%m-%dT%H:%M:%S.%f%z"))
2024-03-07T12:15:30.123456
2024-04-09T13:00:00.000000
2024-05-01T16:30:02.456123+0500
2024-06-01T16:15:00.000000+050314

The main reasons to use .isoformat is if you want this sort of truncation to happen, or because you prefer the simplicity of "just give me a datetime that complies with this standard". The more we complicate isoformat, that more it basically becomes strftime, and it gets bogged down in complexity.

I don't think we should automatically say isoformat should never change or grow new options, but the reasoning here is not particularly compelling, because it's suggesting an opt-in format with a name that most people won't understand where the primary motivating use case not only can be replaced by an strftime call, but arguably should be replaced by an stftime call because:

  1. It is easier to parse — both versions can be parsed by .fromisoformat, but only the stftime version can be parsed by strptime ("oops, this datetime happened to have 0 for the microsecond component and now I need a different parse format!)
  2. People reading the code will know immediately what the format is if you explicitly write it out in strftime, whereas they may not know what isoformat(basic=True) does, or what corner cases apply.
  3. For file names, you probably prefer them to have a consistent file name rather than a "pretty display" file name.

I suppose you could use dt.isoformat(timespec='seconds', basic=True) to alleviate concerns 1 and 3, but that still leaves concern 2.

nineteendo commented 2 months ago

How about dt.isoformat(timespec='seconds', short=True)? That's use case oriented.

short: 20240601T161500.000000
long:  2024-06-01T16:15:00.000000
picnixz commented 2 months ago

The term basic is the term in ISO standards and shoud be left as is IMO (if we were to support it).

pganssle commented 2 months ago

I agree with @picnixz, the name here is not the problem. If the survey on the API for outputting Z is any guide, it is really hard to do something unambiguous. basic=True is almost certainly the best you can do, because it is the standard term for it so it is probably unambiguous, and worst case scenario you can google that term.

That said, almost everyone will have to google that term. I have read ISO 8601 several times, and I implemented two mostly full-featured ISO 8601 parsers, and I had to look up the term to see if it was an official term. No one is going to know what short=True does without looking it up or reading the docs. basic is definitely the best term for this, and it will undoubtedly create cognitive load relative to an explicitly specified format.

I think the main blocker here is that there's no compelling use case (and there actually kind of is a compelling use case for #90772, and we still didn't do that one because we couldn't come up with a non-confusing UX for it).

mohd-akram commented 2 months ago

The motivation for the ISO basic format is the same as the extended format - that it is a standardized machine-readable format that ensures seamless interoperability. You do not get that with many potentially subtly incorrect strftime/strptime implementations, as doing it right requires reading and implementing the spec correctly. That machinery is already implemented in Python, and you can also specify your desired granularity with timespec. Doing this manually would require creating strftime/strptime pairs for each case.

it will undoubtedly create cognitive load relative to an explicitly specified format.

IMO, unless one has the specification table memorized, I don't think "ISO but without - and :" would be more of a cognitive load than figuring out what %Y%M%DT%h%m%s.ext (which is subtly wrong) does.

antonagestam commented 1 month ago

Should it be basic with default value False or extended with default value True?

Or format: datetime.Format = datetime.Format.extended?