haskell-github-trust / thyme

A faster date and time library based on time
BSD 3-Clause "New" or "Revised" License
46 stars 33 forks source link

Data.Thyme.Calendar.Internal.isLeapYear: simplify and make faster #18

Closed klao closed 10 years ago

klao commented 10 years ago

This also makes it compile with GHC 7.8 and thus fixes #15.

I looked at this because thyme wasn't building with GHC 7.8-rc2, and I realized that it's possible to make this function faster while fixing this. Doing bit operations is much faster than dividing by 4.

I did a quick benchmark and the original code runs in 20-40ns (20 on eg. 1999 - the easiest case, 40 on 1900 - the hardest case). The improved code runs in 10-20ns (halved for both cases).

I was playing with these things recently while working on tz. For example I have more than 2 times faster toOrdinalDate implementation. I'll send you a few pull requests with these, if you don't mind.

liyang commented 10 years ago

I picked the commit and fixed some warnings / reformatted it a little, as 0c6aaab.

The old NOINLINE isLeapYear# was added because I found that GHC was duplicating the core 8 times for each of the tests, rather than just twice for leap and non-leap years. I'll have a look at that another time (or wait long enough so I don't have to care about 7.6.3 any more.)

Pull requests for faster code are always welcome!

klao commented 10 years ago

Thanks for cleaning it up!

I took a quick look, and the issue with the original code was that it was implemented in terms of mod. And mod branches on whether its argument is positive or not, and unfortunately GHC optimizer (at least in 7.6) is not good at noticing and collapsing these branches. But of course, if you just want to know if a number is perfectly divisible then rem is as good as mod, and it doesn't branches. So, I always try to formulate everything in terms of quot and rem if possible.

If you take a look at the core of the current implementation, it's quite small and simple and doesn't have unnecessary duplications. So, I think it's not as big of an issue as with the original code.