Closed breezewish closed 5 years ago
em, a little tricky here, I prefer 16 bytes if the performance is not the big problem.
@siddontang 8 bytes is to fit into the register, because all operations over Duration will be enough to be held in the register, then it should be almost as fast as u64.
This seems relatively straightforward. @breeswish is this an issue you can mentor? Any sense of how we measure of the change is worthwhile?
If we're reducing it to 8 bytes, let's at least use a typedef ('struct Duration(u64)') instead of a newtype ('type Duration = u64').
I prefer keeping things clear, so I don't like the hack way of using u64 only for the time Duration, and if MySQL changes the range of time later, this will be a disaster for us.
I also doubt that doing this can help us gain a huge performance increment.
Btw, if we really want to change, I prefer
pub struct Duration {
dur: i64,
fsp: u8,
}
@siddontang Actually MySQL only uses 6 bytes (which you can be inspected from the schema). This thread just explains why Duration only costs 6 bytes and we can utilize this feature to make a 6 byte (use 8 bytes actually to be aligned) duration as well.
@brson Yes I can mentor this issue. I think there can be benchmarks to see how effective it is.
is it really a critical path that hurt the performance very much? if not, don't over optimize
@siddontang It's of course a critical path where Duration is used XD. But if the user is not using Duration then this is unrelated (as expected)
@brson this seems like a common optimization for TiDB. I am wondering if we should be developing a package that helps with this type of abstraction (or maybe there is one?).
@gregwebs We may extract all MySQL data structure implementations to a standalone Cargo project (in future). As you can see, there is already a project called "cop_datatypes" :)
@breeswish If we write the Duration
in C style, is the following code reasonable?
typedef unsigned int u32;
struct Duration {
// nonfractional part
u32 sign : 1;
u32 unused : 1; // reserved for future extensions
u32 hour : 10;
u32 minute : 6;
u32 second : 6;
// totally 24 bits
// fractional part
u32 fsp : 24;
};
@iosmanthus Maybe you don't need so many fsp. It's fine to leave some unused bits. Also you need to find some ways to write these bit fields, which is not a feature provided by Rust.
@breeswish I check the implementation of Duration
, I found that it uses fsp
to represent the count of digits to be reserved instead of the fractional part. And I checked the documentation which says the size of the fractional part depends on fsp
and can be up to 3 bytes. Maybe it's too complicated if we storage the fractional part as the following table just as it shows?
FSP | Storage |
---|---|
0 | 0 bytes |
1,2 | 1 byte |
3,4 | 2 bytes |
4,5 | 3 bytes |
or we can just set the storage space to 3 bytes.
BTW, I found this crate to simulate bitfield in Rust XD: https://github.com/dzamlo/rust-bitfield
@iosmanthus Oh, sorry I misunderstood you. Fsp means the fractional seconds precision. So we need to store these fields actually: sign (1bit), hour (10bit), minute (6bit), second (6bit), fractional seconds (24bit) and additionally (maybe it can be removed) fsp(8bit)
@breeswish in https://github.com/tikv/tikv/blob/master/src/coprocessor/codec/mysql/duration.rs, lots of operations convert nanosecond to Duration
and vice versa. Could we store fractional part as u32
(32 bits) which could be more accurate and store a nanosecond instead of a microsecond as in MySQL
?
the structure would look like:
typedef unsigned int u32;
// totally 8 bytes
struct Duration {
// nonfractional part
u32 sign : 1;
u32 unused : 1; // reserved for future extensions
u32 hour : 10;
u32 minute : 6;
u32 second : 6;
u8 fsp : 8;
// totally 32 bits
// fractional part
u32 frac : 32;
};
@iosmanthus It looks fine. Additionally I believe that fsp should be placed after frac, so that the duration can be directly memory comparable (that is to say, if they are order by memcmp()
then those durations will also be in numeric order).
@breeswish thx! impressive view!
Feature Request
Currently
codec::mysql::Duration
is defined as follows:which costs 24 bytes. It would be nice to define it as:
which only needs 16 bytes.
Smaller structures usually mean less memory copying when their values are used.
Or even more:
which only needs 8 bytes and is small enough to be put directly in a register.
Why only 8 bytes is enough? Because MySQL's Time's ranges from
-838:59:59.999999
to838:59:59.999999
, which isMath.log2(2*839*60*60*1000000)=42.5
bits (less than 6 bytes). We can then encode a 1 bytefsp
in later 8 bits.