Closed sffc closed 6 months ago
There's a PR open for Chinese (https://github.com/unicode-org/icu4x/pulls), but islamic/hebrew also need similar work done.
Name of the constructor that does not load data:
Chinese::new_without_data()
Chinese::new_calculating()
Chinese::new_without_calculated_data()
Chinese::new_runtime_calculating()
Chinese::new_always_calculating()
Discussion:
Conclusion: new_always_calculating()
for the no-data constructor. new()
is the constructor with compiled data.
Next steps:
new_always_calculating()
constructorCopy
It would be nice to have Chinese ready with the pre-calculated caches in 1.3 but does not need to block.
LGTM: @Manishearth @sffc @robertbastian
The plan is:
&PrecomputedData
type parameter on all of its methods. Most calendars use ()
https://github.com/unicode-org/icu4x/pull/4378YearInfo
field which also defaults to ()
. Date mutations must update it.One thing I'm not sure about is if YearInfo should ever differ from what is stored in precomputed data. Currently the YearInfo equivalent in tree for chinese is:
pub(crate) struct ChineseBasedCache {
pub(crate) new_year: RataDie,
pub(crate) next_new_year: RataDie,
pub(crate) leap_month: Option<NonZeroU8>,
}
whereas the precomputed cache is:
pub(crate) struct ChineseBasedCompiledData {
pub(crate) new_year: RataDie,
/// last_day_of_month[12] = last_day_of_month[11] in non-leap years
/// These days are 1-indexed: so the last day of month for a 30-day 一月 is 30
/// The array itself is zero-indexed, be careful passing it self.0.month!
last_day_of_month: [u16; 13],
///
pub(crate) leap_month: Option<NonZeroU8>,
}
(This packs down to three bytes in PackedChineseBasedCompiledData
)
If we are going to store such data live, we might as well store PackedChineseBasedCompiledData
everywhere, which simplifies the architecture down to "load PackedChineseBasedCompiledData
into YearInfo
, if you can't load from precomputed data then compute on your own". In the long run this may mean we don't need to pass in PrecomputedData
everywhere, though in the short run I'd like to grant us that freedom.
However, if we make that call now, we can avoid having to pass in the compiled data for everything other than the until()
and offset_days()
which is nice.
Discussed a bit with @sffc:
In general we think that YearInfo and PrecomputedData will not need to differ on the data stored per-year (they may differ on the data storage format).
This means that ArithmeticDate only needs to worry about PrecomputedData for the offset/until functions, which is a far lower footprint.
The design becomes:
YearInfo
, PrecomputedData
, load_yearinfo(data: &PrecomputedData)
data: &PrecomputedData
. All other ArithmeticDate methods do notload_yearinfo()
.The main downside here is that when working without loaded data, it may be slower to calculate the YearInfo for an entire year since we will have to calculate things for multiple months at a time. This is not a cost we ever have to pay right now. On the other hand, it will probably amortize nicely.
Another downside is that this will increase the size of Date whenever our largest calendar decides to cache more things. This is a price we think is ok to pay since Date is mostly ephemeral.
ugh, found some gnarly stuff that will not be easy to optimize
This needs the number of days in the previous year. We might just also cache that but it's suboptimal, it's only needed for the rather rare case of week-of-year formatting, but it needs to be computed mandatorily when formatting
@sffc We need to make a decision based on the benchmarking results:
Consider adding a new() and deprecating new_calculating() (and the try_new...with_calendar(), if we are definitely not precomputing (otherwise add precomputing)
For background, currently Hebrew
must be constructed with new_always_calculating()
Proposal: We commit to never precomputing, add ::new()
, make Hebrew
constructable as a singleton struct, and deprecate the old APIs. For Hebrew only.
Approval:
Does keviyah work over all dates or are there cases where we would need to fall back to the book calculations? If keviyah is a true drop-in replacement, then yeah, we can just use it all the time and make Hebrew independent of data.
All dates (in range, which is a range of ~i32 years I think: check the code)
Prior versions of the Hebrew calendar used slightly different four gates tables, Adjler documents them all, but the book does not handle that either, it implements the modern set of calculations.
The only question here is if it is fast enough for us to never want to add data loading. The benchmarks convince me but they are definitely at a level where I would not be surprised if others were not convinced.
I've started working on the islamic ones. My current data model is to store month length booleans and then use the remaining 4 bits to store a new years offset from the mean synodic new year (year * 12 * MEAN_SYNODIC_MONTH
). It's unlikely that that number will be super large.
@Manishearth - This is 1.3 blocking because the API of a calendar that has no data versus one that does is different. We could theoretically remove constructors from these types for 1.3 and only do things through AnyCalendar. But then these APIs are functionally useless outside of AnyCalendar.
Parts:
new()
and deprecatingnew_calculating()
(and thetry_new...with_calendar()
, if we are definitely not precomputing (otherwise add precomputing) https://github.com/unicode-org/icu4x/pull/4532