scicloj / tablecloth.time

Tools for the processing and manipulation of time-series data in Clojure.
Other
18 stars 1 forks source link

Add our own time string parsing #37

Closed ezmiller closed 3 years ago

ezmiller commented 3 years ago

Goal / Problem

We need a method like tick's parse method, and that is the last method for which we need tick. So if we add that we can remove tick.

Proposed Solution

This PR adds a string->time method. that will try to convert a string to the right time object. If it fails, it throws an error. See the tests for examples.

Also in this PR:

How to Test

Please just give the tests for both string->time and convert-to a quick sanity test.

Open Questions

In this PR, too, the unsolved issues around the unsupported classes in tech.v3.datatype.datetime comes up. Should we parse a year string as a java.time.Year object? I did add support for that for now, but the question lingers. I added a quick write up of some of the dilemmas there in an issue (#38).

rsiddharthan commented 3 years ago

I am partial to the date/time formats in strptime and strftime, and think that promotes a functional view. Maybe we should take a look at that to see if such a format can be adopted. Expecting a specific input/output formats will allow us to avoid a generic datetime parser that can come later as special case

edit: took a look at the code once more. I see the parseable-proto/parse. Is that the entry point? Maybe we can construct strptime test cases

ezmiller commented 3 years ago

@rsiddharthan thank you for your comments about the time formats. Yeah the entrypoint is Parseable/parse, where Parseable is a protocol that we use to extend String here. We could totally add other checks I think, although the code might get a bit more complicated. The string formats expected here are the defaults that the various java.time classes' parse methods expect. You can, I think, tell such parse methods to expect a different format by, if I remember correctly, giving them a custom DateFormatter object.

Anyway, you make a really good point. I was noticing already that there are differences in expectations, like the R formats for yearmonth look like 2010 Jan which is unliked the java.time default 2010-01. So we could expand support for variations. Regardless, it feels like it would be nice to point users to a concrete list of the time string formats that we do support so that it's not confusing.

I agree with you it makes sense to get this in and then expand on it.