hashsplit / hashsplit-spec

The Unlicense
7 stars 3 forks source link

Define a convention for referring to specific configurations. #22

Open zenhack opened 4 years ago

zenhack commented 4 years ago

Since there are several configuration parameters, it is necessary for systems to specify which parameters they use. It would be somewhat cumbersome to have to write out e.g. "we use hashsplit with the RRS1 has function, S_min = ..., S_max = ..., and threshold = ..." everywhere, and it seems likely that some documentation might carelessly omit parameters (I actually had to go look at the spec to remind myself what all of them are).

I'd like to define a convention for naming configurations to make this a bit less error prone; this is inspired by the the noise protocol's conventions:

http://noiseprotocol.org/noise.html#protocol-names-and-modifiers

Proposal:

HashSplit_<T>_<H>_<S_min>_<S_max>

e.g. HashSplit_13_RRS1_64K_2M for the configuration:

S_min = 64 kibibytes (64 * 2 ^ 10)
S_max = 2 mebibytes (2 * 2 ^ 20)
H = rrs1
T = 13

We'd define the K, M, and G suffixes to be the corresponding powers of two. We would also allow omitting a suffix for small values of S_min/S_max.

I vaguely worry about people mixing up powers of two and powers of ten.

It may also make sense to pick some small number of "recommended" configurations (perhaps even just one) and give them privileged names, to encourage converging on common configurations across systems. I expect there is some utility in being able to vary the parameters, but probably implementations should not do so without specific reasons.

Thoughts?

cole-miller commented 4 years ago

I vaguely worry about people mixing up powers of two and powers of ten.

We could use Ki, Mi instead of K, M, which matches the SI abbreviations for these units.

zenhack commented 4 years ago

Sticking to the SI prefixes is probably a good idea.

cole-miller commented 4 years ago

For hash functions that come in families, like RRS, would we want to define a way of specifying arbitrary parameter values, e.g. RRS<modulus=65536, offset=31>? I guess leaving this ability out goes some way toward discouraging unconventional configurations (and ones that just don't work very well).

zenhack commented 4 years ago

Quoting Cole Miller (2020-10-10 18:45:23)

I guess leaving this ability out goes some way toward discouraging unconventional configurations (and ones that just don't work very well).

Yeah, that was more or less my reasoning. I don't really see a good use case for other rrs-family hashes; really the only upside of RRS1 is that it's exactly what is used by perkeep. I'm on the fence as to whether we should even make the hash function configurable at all; maybe we should just pick one (good) hash function and mandate that.

cole-miller commented 4 years ago

I'm on the fence as to whether we should even make the hash function configurable at all

I think there's some value in describing/specifying alternate hash functions if they've seen independent use: the spec can have a descriptive function in addition to the prescriptive one, and that's useful in its own right (for example to people who want to interoperate with existing code). But I agree with something like "for new applications we strongly recommend this specific hash function".

(My conflict of interest here is that I already wrote a bunch of code to elegantly abstract over the choice of hash function, and I'd hate to have to get rid of it all :P)