moment / luxon

⏱ A library for working with dates and times in JS
https://moment.github.io/luxon
MIT License
15.05k stars 728 forks source link

Perf: Use hackyOffset if it parses date strings in the expected format #1580

Open schleyfox opened 5 months ago

schleyfox commented 5 months ago

This is part of a series of PRs based on performance work we have done to improve a use-case involving parsing/formatting hundreds of thousands of dates where luxon was the bottleneck.

This includes the commit from https://github.com/moment/luxon/pull/1574 to establish the benchmark

This is the sketchy optimization of the bunch and I would understand not wanting to support this, but it is much, much faster. In our observations, about 1.5% of web traffic outputs in a format that doesn't parse correctly, so we fall back to formatToParts.

The time zone offset of a date can be computed on platforms that support it (anything that's not super ancient) by using Intl.DateTimeFormat.formatToParts with en-US to output an array of the date components. For legacy reasons, you can also generate a date string using Intl.DateTimeFormat.format which can be parsed into an array using regexes. The string/regex approach (hackyOffset) is way faster (2-4x), but much more susceptible to weird client configurations.

This detects whether hackyOffset is able to parse a known date correctly, and uses it if it does.

Benchmark Comparison (name | before | after | after/before):

DateTime.local with numbers and zone | 50,913 ±0.18% | 106,177 ±0.18% | 2.09x
DateTime.fromFormat with zone | 26,687 ±0.18% | 35,722 ±0.19% | 1.34x
DateTime#setZone | 175,791 ±0.29% | 302,007 ±0.34% | 1.72x
linux-foundation-easycla[bot] commented 5 months ago

CLA Signed

The committers listed above are authorized under a signed CLA.

schleyfox commented 5 months ago

/easycla

icambron commented 4 months ago

This is a clever optimization and I'm not against it. But one question I have is how many calls it takes to amortize the cost of the upfront check. Do you have a sense of that? For all I know, it just takes one.

schleyfox commented 4 months ago

that's a very good point

import * as b from "benny"

const ts = 1710020705493
const jsDate = new Date(ts)

const utcTestDate = new Date(Date.UTC(1969, 11, 31, 15, 45, 55))

function makeDtf(zone: string) {
    return new Intl.DateTimeFormat("en-US", {
        hour12: false,
        timeZone: zone,
        year: "numeric",
        month: "2-digit",
        day: "2-digit",
        hour: "2-digit",
        minute: "2-digit",
        second: "2-digit",
        era: "short",
    })
}

// from luxon
function hackyOffset(dtf: any, date: any) {
    const formatted = dtf.format(date).replace(/\u200E/g, ""),
        parsed = /(\d+)\/(\d+)\/(\d+) (AD|BC),? (\d+):(\d+):(\d+)/.exec(formatted),
        [, fMonth, fDay, fYear, fadOrBc, fHour, fMinute, fSecond] = parsed ?? []
    return [fYear, fMonth, fDay, fadOrBc, fHour, fMinute, fSecond]
}

const typeToPos = {
    year: 0,
    month: 1,
    day: 2,
    era: 3,
    hour: 4,
    minute: 5,
    second: 6,
}

function partsOffset(dtf: any, date: any) {
    const formatted = dtf.formatToParts(date)
    const filled = []
    for (let i = 0; i < formatted.length; i++) {
        const { type, value } = formatted[i]
        // @ts-expect-error[implicit-any-index-incorrect] hidden by suppressImplicitAnyIndexErrors
        const pos = typeToPos[type]

        if (type === "era") {
            filled[pos] = value
        } else if (pos !== undefined) {
            filled[pos] = parseInt(value, 10)
        }
    }
    return filled
}

const dtf = makeDtf("America/New_York")

void b.suite(
    "Intl.DateTimeFormat",
    b.add("hackyOffset", () => {
        hackyOffset(dtf, jsDate)
    }),
    b.add("partsOffset", () => {
        partsOffset(dtf, jsDate)
    }),
    b.add("makeDtf(UTC) no-cache", () => {
        makeDtf("UTC")
    }),
    b.add("uncached partsOffset", () => {
        const dtf = makeDtf("America/New_York")
        partsOffset(dtf, jsDate)
    }),
    b.add("uncached hackyOffset", () => {
        const utcDtf = makeDtf("UTC")
        hackyOffset(utcDtf, utcTestDate)
        const dtf = makeDtf("America/New_York")
        hackyOffset(dtf, jsDate)
    }),
    b.cycle(),
    b.complete()
)
$ notion ts-node src/test/benchmark/shared/helpers/datetime/datetimeformatOverhead.benchmark.ts
Running "Intl.DateTimeFormat" suite...
Progress: 100%

  hackyOffset:
    589 933 ops/s, ±0.28%   | fastest

  partsOffset:
    234 358 ops/s, ±0.24%   | 60.27% slower

  makeDtf(UTC) no-cache:
    27 116 ops/s, ±6.38%    | 95.4% slower

  uncached partsOffset:
    22 742 ops/s, ±4.06%    | 96.14% slower

  uncached hackyOffset:
    11 640 ops/s, ±11.52%    | slowest, 98.03% slower

Finished 5 cases!
  Fastest: hackyOffset
  Slowest: uncached hackyOffset

DTF construction is very slow, so doing it twice is quite bad. I don't know why the error bars are so high for it, however.

I think this math works, but let me know if it makes sense to you.

Analytically, we can see that constructing a DTF takes 1MM/27,116 = 36.9us . hackyOffset takes 1MM/589,933 = 1.7us, while partsOffset takes 1MM/234,358= 4.3us

we then expect a single uncached partsOffset to take 36.9us + 4.3us = 41.2us (observed is 44us) likewise, we'd expect a single uncached hackyOffset to take 36.9us + 1.7us + 36.9us + 1.7us = 77.2us (observed is 86us)

77.2 - 41.2 = 36us to make up. We would save 4.3 - 1.7 = 2.6us per call, so 36/2.6 = 13.8 calls before our optimization pays off.

Note that we may have to call *Offset up to 2 or 3 (but mostly 2 except for time holes) times per DateTime, so this could pay off in as little as 7 DateTimes (or 4.6 if you are specifically testing edge cases). At a few hundred DateTimes, you start saving milliseconds.

icambron commented 3 months ago

How about just adding Settings.setUseHackyOffset()? Seems like that would allow specific applications that do a ton of formats to speed up while sacrificing a few locales, and others to not worry about it.