Open sffc opened 3 months ago
For list the explanation is that the data struct contains 10 cows (4 patterns of 1 cow, and two conditions of 3 cows each), but usually only encodes tiny texts (,
, and
, etc.). Unit and And data doesn't show up because the Spanish/Hebrew regexes equalise things between baked and postcard. So it's a similar problem to decimal formatter.
"baked size" here is the size of the .rs file, yes?
no, an estimate for in-memory size, ignoring &'static
deduplication
in-memory size
means actual PSS cost? Can we also calculate on-disk binary size impact?
If you tell me what PSS is I might be able to answer this
list/and@1 also showed up, but I didn't take the time to copy it into the table; I tried to include a representative cross-section in the OP.
"baked size" refers to the in-memory size based on the bake_size
, which is core::mem::size_of
plus borrows_size
.
These numbers are roughly reflective of what happens when I compile ICU4X with the compiled_data
feature versus when I build Postcard data with icu4x-datagen. compiled_data
produces a larger binary than no-default-features with postcard.
I computed fingerprints.csv based on both
baked_size
andpostcard_size
.Baked is equal in size or bigger than postcard for every data marker. A selection of the biggest offenders by overall size or percentage:
The good news is that many of these keys will be improved under #5230 or #5379.
Should we do anything?
@robertbastian @Manishearth