metomi / isodatetime

:date: :watch: Python ISO 8601 date time parser and data model/manipulation utilities
GNU Lesser General Public License v3.0
38 stars 20 forks source link

Question: sample ISO8601 durations? #219

Closed jayaddison closed 2 years ago

jayaddison commented 2 years ago

Hello - I've been dabbling with an AGPLv3-licensed implementation of ISO8601 duration parsing that subclasses the built-in Python timedelta object.

As you're probably well aware from building isodatetime, Python's built-in timedelta objects have some limitations, particular the lack of support for year-and-month fields in the constructor. Even so, I figured it was worth attempting an implementation; partly after learning about an open ticket for it in the Python bugtracker.

I'm trying to focus on correctness and performance against ISO8601:2004 (although I'm not yet confident enough to declare support for that spec), and have created some test coverage and benchmarks -- but I'd like to build confidence against more representative datasets.

Do you know of any open/public-licensed and reasonably-sized (hundreds/thousands of item) sets of real-world-ish ISO8601 durations that I could test against?

Thanks either way! James

MetRonnie commented 2 years ago

Sorry, we are not aware of any such dataset. We rely on a few dozen or so examples in our tests.

jayaddison commented 2 years ago

Ok, no problem - thanks anyway!

jayaddison commented 2 years ago

Noting some findings: one of the most widely-available data formats that includes ISO8601-format durations seems to be NetCDF.

And some open/public datasets published NetCDF datasets include:

(within those, some of the datasets are composed of fixed-duration data items (P1D for example), so may not be suitable for testing/benchmarking - but datasets containing variable-duration items exist too)

jayaddison commented 2 years ago

Hi again - I'm sorry - I don't expect that you have begun or are evaluating timedelta-iso8601 - but if you are, please hold on as I may have violated a licensing policy with it, and will be yanking it from PyPi and making it private on GitHub until that question can be resolved. Apologies for the noise if you aren't evaluating, and for any frustration/complications if you are.

jayaddison commented 2 years ago

Continuing on with what is likely sending spam into the void here (not a criticism; I just want to provide an update for completeness' sake): to distance from potential concerns about repurposed method signatures and docstrings from cpython.git in the timedelta-iso8601 library, and rather than attempting to smooth those over in-place on an existing work, I decided I wanted to re-implement the (pure-Python, no-regex) functionality from scratch, this time without opening or copying any code from cpython.git to be on the safe side compliance-wise.

The result of that is timedelta-isoformat on GitHub -- also available as wheel packages under the name timeformat-isoformat on PyPi.

The license is AGPLv3 again, which I understand probably makes it unattractive for use in many situations, and I don't think or expect it'd be relevant for the Met Office to evaluate -- but I'd started the conversation about the original here, and want to complete that by mentioning the clean version (which I plan to performance optimize a bit further).