ak._v2.from_iter should recognize Python datetimes/timedeltas

jpivarski commented 2 years ago

Version of Awkward Array

HEAD

Description and code to reproduce

https://github.com/scikit-hep/awkward/blob/77b06b3575737ec3cebc4c9b76ddceaf584d57c3/src/python/content.cpp#L952-L957

and

https://github.com/scikit-hep/awkward/blob/77b06b3575737ec3cebc4c9b76ddceaf584d57c3/src/python/content.cpp#L846-L886

handle NumPy datetime objects, but Python datetime objects are not explicitly recognized. (What happens? Does it raise a "cannot convert..." ValueError?)

Given a datetime.datetime, one can call .timestamp() to get a number of seconds since 1970 as a floating point number. To fill the ArrayBuilder::datetime or ArrayBuilder::timedelta, we have to provide a 64-bit integer and a type string. We could choose that type string to be "s" and cast the floating point number as an integer as-is, or choose the type string to be "ms" and multiply the number by 1000, etc.

Either we hard-code a choice or pass it down as an argument (but that means passing it down through C++, like the options in from_json). Note:

"ns" (nanosecond) resolution gives us a range from 1677 through 2262
"us" (microsecond) resolution gives us a range from 290307 B.C.E through 294247 C.E., though if we were constructing Python datetime objects, these years are already out of range.
there's no point in going to less granular units ("ms", "s") if Python datetime can't even represent them.

So the only units that make any sense to assume are microseconds and nanoseconds. Microseconds covers the entire range that Python datetime is capable of (and more), and nanosecond resolution sounds rather special-purpose. (Users can make NumPy datetimes if they need that.) So I would vote to hard-code it as microseconds.

agoose77 commented 2 years ago

@jpivarski that's a good point actually; microseconds are the finest granularity that we can reason about with date-time objects. As this is all Python-iteration anyway, users can always write a loop to transform these things if they need to for some reason.

jpivarski commented 2 years ago

I just noticed the issue number:

agoose77 commented 2 years ago

Closed by #1721

scikit-hep / awkward

ak._v2.from_iter should recognize Python datetimes/timedeltas #1701

Version of Awkward Array

Description and code to reproduce