janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.52k stars 227 forks source link

Added os.strftime() #1128

Closed zevv closed 1 year ago

zevv commented 1 year ago

I found that Janet is lacking an efficient way for formatting dates and times. This patch adds a simple strftime() wrapper as os/strftime.

Is this something that is acceptable to go into the os core lib, or would Spork be a better place?

sogaiu commented 1 year ago

Possibly relevant: https://stackoverflow.com/questions/25726331/strftime-f-does-not-work-on-windows

bakpakin commented 1 year ago

I don't know if supporting really old MSVC is really a concern here, but yes portability is always a concern when adding stuff like this. I agree though, it would be nice to have an efficient, standard way of doing date formatting that didn't mess with extracting stuff from dates, and Janet's behavior of making all date parameters 0-indexed (consistent, but oh so incredibly confusing).

Looking at the link @sogaiu shared, it might also be worth doing a check on the format string for unsupported identifiers, but I don't think that that is so important since that is questionable behavior in a really old version of MSVC.

Fix the codeQL errors as well - (get_number could be replaced by getinteger64), and look to the os_date function for how it gets the current time - there is some hacks for thread safety and running on windows.

zevv commented 1 year ago

Code updated.

bakpakin commented 1 year ago

The time_t to struct tm * code is now in two places, and it's already a tad messy because of the #ifdefs; would you prefer this refactored out of os_date and os_strftime into a helper fn?

Probably good to refactor it out into a small static helper function.

As for bad format specifiers, for now it is probably best to panic. I don't feel to strongly either way, and tbh part of me thinks that we don't really need the check - not worth the extra overhead just for the sake of msvc2013

zevv commented 1 year ago

The hairy code has now been moved out of os/date and os/strftime into a common helper function.

I'd be glad to add the fmt specifier test, but I share your feeling that it might not being worth the overhead (and extra code complexity)

sogaiu commented 1 year ago

I tried out the code a little bit on a Linux box. So far it seems to be working fine here -- didn't try E or O though.

Perhaps not having a check initially and only considering an addition later if a need is demonstrated is practical.

sogaiu commented 1 year ago

On a Windows box some strings cause janet.exe to exit here.

I tried the things from the gist as well as the E and O things mentioned in a man page (at the bottom of the DESCRIPTION section). The following are what caused an exit here:

zevv commented 1 year ago

Bah, not formatting properly with these specifiers might be acceptable, but crashing is not of course; I'll add a check and panic properly.

Related question: I don't have any windows machines at hand, I can test the win32 path by cross-compiling for mingw and running on wine, but I need to tweak the make file for this - is there an official way of cross-building a win32 target on linux?

sogaiu commented 1 year ago

I'm not sure if this is relevant documentation but I found this:

This function validates its parameters. If strDest, format, or timeptr is a null pointer, or if the tm data structure addressed by timeptr is invalid (for example, if it contains out of range values for the time or date), or if the format string contains an invalid formatting code, the invalid parameter handler is invoked, as described in Parameter validation. If execution is allowed to continue, the function returns 0 and sets errno to EINVAL.

via: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/strftime-wcsftime-strftime-l-wcsftime-l?view=msvc-170#remarks

sogaiu commented 1 year ago

Related question: I don't have any windows machines at hand, I can test the win32 path by cross-compiling for mingw and running on wine, but I need to tweak the make file for this - is there an official way of cross-building a win32 target on linux?

I haven't heard of a way, but I think @bakpakin can give a definitive answer on this.

zevv commented 1 year ago

I'm not sure if this is relevant documentation but I found this:

Ah, thanks for that, I'll see if that can be used instead of creating a new verification function.

Every time I get the illusion things might work in a slightly standard or posix-y way on win32, I'm up for new surprises.

Properly checking all available options isn't too nice either; this is how Lua does it:

https://github.com/lua/lua/blob/master/loslib.c#L31-L43 https://github.com/lua/lua/blob/master/loslib.c#L272-L288

zevv commented 1 year ago

What about limiting the allowed set of format specifiers to the C89 definition? That is the minimal common supported subset of all implementations and will make sure any Janet code using strftime will work on any platform.

zevv commented 1 year ago

Added validation for the C89 subset of specifiers; panics on violation. This should be safe on all platforms.

sogaiu commented 1 year ago

I tried d0896ba on a Windows box a bit.

janet.exe doesn't exit for the calls tried earlier -- nice error messages instead :+1:

bakpakin commented 1 year ago

LGTM. Let me know if this is ready (it was in draft status) and if so, I will merge it.

zevv commented 1 year ago

Squashed and rebased to master, ready to go.

zevv commented 1 year ago

"The base branch requires all commits to be signed." - does that imply my commits need to be signed, or is that only for the merge commits done by the maintainers?

CosmicToast commented 1 year ago

"The base branch requires all commits to be signed." - does that imply my commits need to be signed, or is that only for the merge commits done by the maintainers?

I've had changes merged without signing, so it seems to be mostly for the maintainers.


As a sidenote, I took a look at the existing date/time faculties in Janet and decided to try to make an external library that can do additional niceties here.

It's a WIP, but notably is guaranteed to work on any ISO C platform (as no assumptions are made about the nature of time_t, which the standard merely guarantees to be arithmetic (pre-C11) or real arithmetic (post-C11)), and supports things like modifying a given date (for example, "add 30 days and 800 seconds to this date").

While it's quite a bit heavier (and thus makes no sense to include in mainline), it may be of interest for the two implementations to share some commonalities, which is why I mention it.

primo-ppcg commented 1 year ago

@zevv os/strptime when? 😉

zevv commented 1 year ago

@zevv os/strptime when? wink

afaik strptime() is POSIX only, so it's not trivial to bring this to windows. I guess it might be easier to implement this directly in Janet, but it'll bring some extra size too boot.janet I'm afraid.

primo-ppcg commented 1 year ago

afaik strptime() is POSIX only, so it's not trivial to bring this to windows.

A valid point, although there does seem to be compatible alternatives. I was just thinking now that we have convenient command line processing, convenient parsing of dates would be nice too. The community example which attempts to do precisely this is completely wrong.

zevv commented 1 year ago

@primo-ppcg What about something like this:

(defn strptime [fmt ts]
  # Convert the fmt string to a peg rule
  (def grammar
    ~{:main (any :thing)
      :thing (+ :fmt :other)
      :fmt (replace (* "%" (<- :fmtchar)) ,(fn [c] (keyword c)))
      :other (<- (to "%"))
      :fmtchar (set "YmdHMSpIabcyABZc")
      })
  (def rule (peg/match grammar fmt))
  # Parse the time with the generated peg rule and fill in date-time struct
  (def dt @{})
  (def p
    ~{:main ,(tuple '* ;rule)
      :Y (cmt (number 4) ,|(put dt :year $))
      :y (cmt (number 2) ,|(put dt :year (+ 2000 $)))
      :m (cmt (number 2) ,|(put dt :month (dec $)))
      :d (cmt (number 2) ,|(put dt :month-day (dec $)))
      :H (cmt (number 2) ,|(put dt :hours $))
      :I (cmt (number 2) ,|(put dt :hours $))
      :M (cmt (number 2) ,|(put dt :minutes $))
      :S (cmt (number 2) ,|(put dt :seconds $))
      :p (+ "AM" :PM)
      :PM (cmt "PM" ,|(put dt :hours (+ (dt :hours) 12)))
    })
  (peg/match p ts)
  (os/mktime dt))

(def tests [
  [ "%Y-%m-%d %H:%M:%S" "2023-05-28 11:19:26" 1685272766 ]
  [ "%Y%m%d%H%M%S" "20230528111926" 1685272766 ]
  [ "%d%m%y %H:%M:%S" "280523 11:44:25" 1685274265 ]
  [ "%d%m%y %I:%M:%S%p" "280523 11:48:55AM" 1685274535 ]
  [ "%Y-%m-%d %H:%M:%S" "2023-05-28 11:19:26" 1685272766 ]
  [ "%Y%m%d%H%M%S" "20230528111926" 1685272766 ]
  [ "%d%m%y %H:%M:%S" "280523 11:44:25" 1685274265 ]
])

(each [fmt ts t1] tests
  (def t2 (strptime fmt ts))
  (if-not (= t1 t2)
    (printf "%p %p %d" fmt ts (- t1 t2)))
  )
primo-ppcg commented 1 year ago

@zevv LGTM, although format string is typically the second argument, right?

zevv commented 1 year ago

Sure, order of arguments I didn't consider yet, it was just a fun proof of concept.

What I do think is that we might need to rethink all this; adding os/strptime when it is not a real wrapper around strptime() feels a bit silly, maybe it is better to make a proper design for handling time in Janet and step away from only wrapping libc functions. Time to start the time/ library? Stdlib, spork or separate lib?

Time handling can get pretty complex, so I guess it's best to steal the details from a language that solved this problem.

Formatting time, these functions take strftime like format strings but also know about :rfc2822, rfc3339, etc

Get time: system clock, monotonic clock, CPU usage time.

Converting epoch time to/from date-time struct time:

Lots of choices to make here I guess.

sogaiu commented 1 year ago

For reference, some other efforts in the date / time area:

zevv commented 1 year ago

Thanks @sogaiu; https://github.com/CosmicToast/janet-date/blob/main/TECH.md is a nice introduction to the joys of working with time.

sogaiu commented 1 year ago

the joys of working with time

Ha ha ha

On a side note, I'd heard about issues with leap seconds, but hadn't seen this list...

zevv commented 1 year ago

I think I might be willing to put some more time and effort in this; I have made all the obvious and some less obvious time-related mistakes at least three times in my career, so I should be aware of some of the pitfalls.

One of the above linked libraries mentions it "is not a library", one of the others states it "needs a lot of TLC", and both of those seem stale for a few years. @cosmictoast: What is your intention with regards to https://github.com/cosmictoast/janet-date, were you just about to whip up a complete, correct and full-featured date/time library by any chance? :)

zevv commented 1 year ago

I've whipped up some code to see how much can be done in pure Janet; it provides the basic functions for getting time, converting between time and date struct, formatting and parsing, all taking care of time zone handling where appropriate. This is mostly just thin wrappers around the os/ functions while setting the TZ variable, and adds a variation on the above time parser function.

WIP status, of course, I don't even know if I like where this is going.

https://github.com/zevv/janet-time

This mostly works for Linux, but as expected the time zone handling is tricky to do portable.

sogaiu commented 1 year ago

Do you know of any good date / time test data that is "externalized" / "data-ed" such that it could be reused in this sort of context?

primo-ppcg commented 1 year ago

afaik strptime() is POSIX only, so it's not trivial to bring this to windows.

On the other hand, if an os function isn't available on a particular OS, because that particular OS doesn't support it, that wouldn't necessarily be so bad, either.

CosmicToast commented 1 year ago

@CosmicToast: What is your intention with regards to https://github.com/cosmictoast/janet-date, were you just about to whip up a complete, correct and full-featured date/time library by any chance? :)

The intent was to have a date (no time facilities, but I've been reconsidering it) library that provides 100% of the ISO C99 spec, with the goal of being easy to include into spork later. I'm just a bit busy for now (enough to not really be able to work on medium and larger projects like that), since I'm in the process of a bunch of things in meatspace + dayjob. Once things calm down in either of those categories I'll most likely resume work on it. And hey, it gives Apple some time to at least acknowledge the bugs I found in their libc 😄

One avenue towards improving the situation I've considered is pulling in the IANA tz code as it's public domain (I try to have all of my projects be Unlicensed or comparable, outside of specific circumstances). FreeBSD used to have the same bugs as the current MacOS libc, and they fixed it by just replacing the entire thing with the above (I tested the fixes already), something I recommended Apple do but hey.

zevv commented 1 year ago

@CosmicToast sounds like you're totally on top of it, I guess I'll just quickly pull my hands off the topic before I burn myself!

sogaiu commented 1 year ago

In the spirit of storing away reference information in a place protected by a leopard, I'm going to leave this here.

CosmicToast commented 1 year ago

While it's not 100% done and there's plenty left to do (improving built-in formats, actually exposing them, etc) it can be tested now. In particular, I'm interested in whether it builds on common platforms I don't have too much exposure to (e.g windows, freebsd, etc). In theory it should build on anything that can do ISO C99 (though maybe not tcc this time).

sogaiu commented 1 year ago

I'll try on Windows.


On a side note, this PR area is quickly becoming my favorite secret Janet discussion hangout (^^;

sogaiu commented 1 year ago

Below is what I get for jpm build:

C:\Users\user\Desktop\janet-date>jpm build
compiling src/util.c to build/src___util.o...
compiling src/main.c to build/src___main.o...
compiling src/tm.c to build/src___tm.o...
generating meta file build/date/native.meta.janet...
compiling src/polyfill.c to build/src___polyfill.o...
compiling src/time.c to build/src___time.o...
compiling tz/localtime.c to build/tz___localtime.o...
compiling tz/asctime.c to build/tz___asctime.o...
compiling tz/difftime.c to build/tz___difftime.o...
compiling tz/strftime.c to build/tz___strftime.o...
compiling src/main.c to build/src___main.static.o...
compiling src/util.c to build/src___util.static.o...
tm.c
src/tm.c(75): error C2036: 'void *': unknown size
src/tm.c(123): error C2036: 'void *': unknown size
src/tm.c(150): error C2036: 'void *': unknown size
src/tm.c(233): error C2036: 'void *': unknown size
src/tm.c(322): error C2099: initializer is not a constant
src/tm.c(323): error C2099: initializer is not a constant
src/tm.c(324): error C2099: initializer is not a constant
src/tm.c(325): error C2099: initializer is not a constant
error: command failed with non-zero exit code 2
difftime.c
C:\Users\user\Desktop\janet-date\tz\private.h(816): warning C4005: 'max': macro redefinition
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\stdlib.h(1282): note: see previous definition of 'max'
C:\Users\user\Desktop\janet-date\tz\private.h(817): warning C4005: 'min': macro redefinition
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\stdlib.h(1283): note: see previous definition of 'min'
tz/difftime.c(20): error C2084: function 'double difftime(const time_t,const time_t)' already has a body
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\time.h(479): note: see previous definition of 'difftime'
error: command failed with non-zero exit code 2
localtime.c
C:\Users\user\Desktop\janet-date\tz\private.h(816): warning C4005: 'max': macro redefinition
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\stdlib.h(1282): note: see previous definition of 'max'
C:\Users\user\Desktop\janet-date\tz\private.h(817): warning C4005: 'min': macro redefinition
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\stdlib.h(1283): note: see previous definition of 'min'
tz/localtime.c(405): error C2146: syntax error: missing ';' before identifier 'nread'
tz/localtime.c(405): error C2065: 'nread': undeclared identifier
tz/localtime.c(456): error C2065: 'nread': undeclared identifier
tz/localtime.c(457): error C2065: 'nread': undeclared identifier
tz/localtime.c(458): error C2065: 'nread': undeclared identifier
tz/localtime.c(495): error C2065: 'nread': undeclared identifier
tz/localtime.c(623): error C2065: 'nread': undeclared identifier
tz/localtime.c(624): error C2065: 'nread': undeclared identifier
tz/localtime.c(630): error C2065: 'nread': undeclared identifier
tz/localtime.c(631): error C2065: 'nread': undeclared identifier
tz/localtime.c(635): error C2065: 'nread': undeclared identifier
tz/localtime.c(1450): warning C4273: 'tzset': inconsistent dll linkage
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\time.h(580): note: see previous definition of 'tzset'
tz/localtime.c(1647): error C2084: function 'tm *localtime(const time_t *const )' already has a body
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\time.h(495): note: see previous definition of 'localtime'
tz/localtime.c(1696): error C2084: function 'tm *gmtime(const time_t *const )' already has a body
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\time.h(488): note: see previous definition of 'gmtime'
tz/localtime.c(2301): error C2084: function 'time_t mktime(tm *const )' already has a body
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\time.h(511): note: see previous definition of 'mktime'
error: command failed with non-zero exit code 2
time.c
src/time.c(86): error C2099: initializer is not a constant
src/time.c(87): error C2099: initializer is not a constant
src/time.c(88): error C2099: initializer is not a constant
error: command failed with non-zero exit code 2
asctime.c
C:\Users\user\Desktop\janet-date\tz\private.h(816): warning C4005: 'max': macro redefinition
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\stdlib.h(1282): note: see previous definition of 'max'
C:\Users\user\Desktop\janet-date\tz\private.h(817): warning C4005: 'min': macro redefinition
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\stdlib.h(1283): note: see previous definition of 'min'
tz/asctime.c(114): warning C4273: 'asctime': inconsistent dll linkage
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\time.h(121): note: see previous definition of 'asctime'
tz/asctime.c(123): warning C4047: 'initializing': 'tm *' differs in levels of indirection from 'int'
tz/asctime.c(128): error C2084: function 'char *ctime(const time_t *const )' already has a body
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\time.h(471): note: see previous definition of 'ctime'
error: command failed with non-zero exit code 2
C:\Users\user\Desktop\janet-date\tz\private.h(816): warning C4005: 'max': macro redefinition
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\stdlib.h(1282): note: see previous definition of 'max'
C:\Users\user\Desktop\janet-date\tz\private.h(817): warning C4005: 'min': macro redefinition
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\stdlib.h(1283): note: see previous definition of 'min'
tz/strftime.c(129): warning C4273: 'strftime': inconsistent dll linkage
C:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt\time.h(266): note: see previous definition of 'strftime'

The following are some docs and lines for the things that MS' compiler seems to have difficulty with:

src/*.c

There were also some difficulties with things in tz, and there are a couple lines in its Makefile that mention Windows:

I didn't look at the "errors" and "warnings" much for the tz things as I thought may be the Makefile content might end up taking care of (or changing) them, but for the record:

tz/*.c

Errors:

Warnings:

sogaiu commented 1 year ago

Haven't made a whole lot of progress on the Windows side but I got curious whether building would succeed in one of my Linux environments.

For cf3a7824, I get the following for jpm build:

compiling src/time.c to build/src___time.o...
compiling src/main.c to build/src___main.o...
compiling src/polyfill.c to build/src___polyfill.o...
compiling src/tm.c to build/src___tm.o...
compiling src/util.c to build/src___util.o...
compiling tz/localtime.c to build/tz___localtime.o...
compiling tz/asctime.c to build/tz___asctime.o...
compiling tz/difftime.c to build/tz___difftime.o...
compiling tz/strftime.c to build/tz___strftime.o...
generating meta file build/date/native.meta.janet...
compiling src/polyfill.c to build/src___polyfill.static.o...
compiling src/util.c to build/src___util.static.o...
In file included from /usr/include/time.h:29,
                 from src/date.h:3,
                 from src/tm.c:1:
src/polyfill.h:35:21: error: ‘struct tm’ has no member named ‘tm_gmtoff’; did you mean ‘__tm_gmtoff’?
   35 | #  define TM_GMTOFF tm_gmtoff
      |                     ^~~~~~~~~
src/tm.c:39:40: note: in expansion of macro ‘TM_GMTOFF’
   39 |         {"gmtoff", offsetof(struct tm, TM_GMTOFF)},
      |                                        ^~~~~~~~~
src/polyfill.h:38:19: error: ‘struct tm’ has no member named ‘tm_zone’; did you mean ‘tm_mon’?
   38 | #  define TM_ZONE tm_zone
      |                   ^~~~~~~
src/tm.c:42:38: note: in expansion of macro ‘TM_ZONE’
   42 |         {"zone", offsetof(struct tm, TM_ZONE)},
      |                                      ^~~~~~~
src/tm.c: In function ‘jd_tm’:
src/tm.c:224:14: error: ‘struct tm’ has no member named ‘tm_gmtoff’; did you mean ‘__tm_gmtoff’?
  224 |         out->tm_gmtoff = 0;
      |              ^~~~~~~~~
      |              __tm_gmtoff
error: command failed with non-zero exit code 1
error: build fail
  in pdag [/home/user/.local/lib/janet/jpm/dagbuild.janet] (tailcall) on line 79, column 23
  in run-main [boot.janet] on line 3859, column 16
  in cli-main [boot.janet] on line 4007, column 17

This is in an Ubuntu 22.04.2 environment.

CosmicToast commented 1 year ago

no member named ...

This is a glibc issue, it wanted a _GNU_SOURCE define at the top. I've made it unconditional since you could have other GNU-ish systems like that, and non-GNU shouldn't care.

windows

I've made a few changes. MSVC appears to be very dumb. For instance, it doesn't understand void arithmetic, so I had to make it unsigned char. I've also removed IANA tz on windows only, and added timegm (windows provides it and all other platforms use IANA tz). In the meanwhile, it now builds (but fails to link, but I can't even link spork in my test environment, had to dig up some sacrificial laptop for this experiment). Try again? :) If it fails to link, make sure it can link spork, in the meanwhile.

sogaiu commented 1 year ago

Thanks for the explanations and changes.

It appears the hardware sacrifice was not in vain :)

Builds and tests in both of my Windows and Linux environments :+1:

CosmicToast commented 1 year ago

An initial draft form should now be available on the main branch. The only real functionality missing is getting localtime for a specific zone, but that's not possible with ISO C99 alone, and awkward against libtz (which is notably not used on windows). While timezone_t is technically a pointer, you still need to actually instantiate it, and depending on the ABI this much is outside of my comfort zone. If someone wants to write and test something like struct tm *localtime_rz(void *restrict, time_t const *restrict, struct tm *restrict); and make it work, it could be added, but I'm not sure it's all that useful given we have proper UTC handling in. If this feels like a good fit for spork, how would I go towards submitting it (once it's more tested) exactly?

sogaiu commented 1 year ago

how would I go towards submitting it (once it's more tested) exactly?

It seems like a PR to the spork repository would be one way.

Or perhaps I misunderstood the question?