r-lib / vctrs

Generic programming with typed R vectors
https://vctrs.r-lib.org
Other
287 stars 66 forks source link

Process algebra feedback #130

Closed hadley closed 2 years ago

hadley commented 5 years ago

Probably by drawing some diagrams that help me understand this stuff.


1) The real numbers (or doubles) form a field K (K for German “Körper”).

This encapsulates that we can add, subtract, multiply, and divide real numbers with/from each other.

(2) The durations form a vector space L over F (L for “line”).

This encodes that we can add and subtract durations to/from each other, and that we can multiply durations by doubles.

(2a) The dimension of L over K is 1.

This implies that we can divide a duration by a non-zero duration to obtain a double.

(3) The dates form an affine space X for L.

This encapsulates that we can add durations to dates to get dates, and subtracts to dates from each other to obtain durations.

If you want to talk about positive vs negative durations, you need to enhance the above:

(1) K is an ordered field, which means that non-zero doubles are either positive or negative.

(2) L is an oriented 1-dimensional vector space over K, which means that non-zero durations are either positive or negative (in a way compatible with the positive/negativity of doubles).

(3) X does not need any additional structure: a date A is before a date B if the duration B-A is positive.

There are other properties of K that one might want to talk about (completeness, Archimedian property), but I don’t think they really have a bearing on L or X.


A set with an associative binary operation, call it , is called a monoid. So if x, y, and z are in the set then xy is in the set and (xy)z = x(yz) so we can just write xyz. If it is also the case that xy = yx then we call that a commutative monoid. We usually reserve the symbol ‘+’ to denote a commutative operation, so it would be mathematically weird notation to have a monoid where x+y did not equal y+x. In that sense, I think in the below example, + isn’t great notation (at least mathematically) for paste0 which does seem to give a (non-commutative) monoid structure on the set {strings}. Sometimes a monoid might have a “unit”, i.e. an element in the set, call it ‘1’, such that 1x=x1=x for all x. In the example with pasting strings, this would be the empty string. The standard notation for repeating a multiplicative operation would be exponentiation. So I would want to write string concatenation multiplicatively so that

“banana” = xyy = x*y^2

where x=“ba” and y=“na”. I would prefer that to

“banana” = x + y + y = x + 2*y

because I don’t like the fact that equations like 2(x+y) = 2x+2*y and x+y = y+x are false.

In a monoid, the inverse operation to multiplication is not in general defined, but does sometimes make sense. Hadley would seem to want to apply the inverse operation when it makes sense and just have it do nothing when it doesn’t make sense. What is typically done in mathematical notation is if z = xy in a monoid then we will write x=zy^{-1} even though y^{-1} doesn’t make sense on its own. If there does not exist an x such that z=xy, then we would say zy^{-1} is “not defined”. So to apply this to the examples below I would have

Suppose x=“Hi”, y=“i”, z=“ello, w=“Hello”

I would say xy^{-1}z = w but for example zy^{-1} is not defined (returns an error in programming?). And notice that xzy^{-1} is also not defined. Note also that we use xy^{-1} instead of x/y since it avoids ambiguity coming from the lack of commutativity. So if u=“onion” and t=“on” we want to distinguish between t^{-1}u = “ion” and ut^{-1} = “oni” .

Moving on to what Hadley is calling and / . When we replace + with then we should replace *2 and /2 by ^2 and sqrt where the later may not always be defined. So

if x = “Long “ then x^2 = “Long Long “ and (x^2)^{1/2} = “Long “. But I would say that “Long string” does not have a square root . I think that if you have an operation which takes a string x of length 2n and returns the first n characters it should not be notated by square root (or by /2 if one insists on using “+” for concatenation).

lionel- commented 2 years ago

Are these notes about dates and durations still relevant for vctrs? Or should they be moved to lubridate / clock?

Regarding + vs *, I think Jim settled that by choosing + for glue concatenation. I would have liked *, but his reasoning was that it is more important to be consistent across languages than mathematically pure.

hadley commented 2 years ago

I think we can close it now.