summation functions should be lazy, more general than `openArray`

timotheecour commented 4 years ago

typical summation functions (in std/sums) only use each input element once, so should be usable in an online mode. However the API's in std/sums take an openArray input, preventing that.

Instead, a better API would maintain a state, and allow to update it when a new element is consumed.

example

instead of this API (https://github.com/nim-lang/Nim/pull/16004)

func sumShewchuck*[T: SomeFloat](x: openArray[T]): T =
  ...

use this:

type ShechukState[T: SomeFloat] = object
  ... # sufficient statistics

proc initShechukState*[T](): ShechukState[T] = ...

proc add[T: SomeFloat](result: var ShechukState[T], a: T) =
   ... 

proc peekFinal*[T: SomeFloat](a: ShechukState): T =
  ... # non-destructive, hence `let` input
      # computes (in the case of sums) the total from the sufficient statistics

This would be analog to other procs that can be used online, eg:

hash.!& # analog to add
hash.!$ # analog to peekFinal

md5Update # analog to add
md5Final # analog to peekFinal

note, a high level wrapper can still be defined for convenience, eg:

template sumShewchuck*[T: SomeFloat](iter: untyped): untyped =
  var state = initShechukState[T]()
  for a in iter: add(state, a)
  peekFinal(state)

ditto with other summation procs

note

even better, can also be generic wrt the summation algorithm:

template sum*[Algo: Shewchuck | SumTwoSum], T: SomeFloat](algo: typedesc[Algo], iter: untyped): untyped =
  ... 

doAssert Shewchuck.sum(@[1.1, 1.2]) == 1.1+1.2

note2

more generally, outside of std/sums, lots of algorithms should be usable online, and openArray isn't the best API for those. For these, an Iterable[T] (for now, untyped) is the better API.

planetis-m commented 4 years ago

Maybe in a seperate module?

Araq commented 4 years ago

In general I agree but we need to watch out -- lazy APIs can become super annoying to use. ("Yes this really only requires an IEnumerable<string> (or is that an IEnumerable<IEnumerable<char>> or an IEnumerable<IEnumerable<T>>?), but I'm debugging, please tell me its length at least or all of its contents...")

timotheecour commented 4 years ago

lazy APIs can become super annoying to use

I disagree, from a long experience with D. D uses duck typing, not IEnumerable<IEnumerable<T>>, and nim can further improve on that with the nice syntax of concepts and the flexibility of duck typing, see below.

but I'm debugging, please tell me it's length at least or all of its contents

I don't see the problem:

proc sum(a: iterable): ElementType(a) =
  # I'm debugging, please tell me it's length
  when overloadExists(a.len): debugEcho a.len

  # I'm debugging, please tell me all of its contents
  when overloadExists(a.copy): debugEcho a.copy
    # aka in D: a forward range, which has a `save` property, to avoid messing with original
    # (eg: not available for things like stdin)

  # do the work with what `a` offers
  # conditionally use `len` if algorithm can exploit this (eg preallocate some seq)
  for ai in a: result.add ai

echo sum(@[1,2]) # ok
echo sum(stdin.lazyMapIt(it.float)) # works too with stdin stream, requires O(1) memory instead of O(n)

how this works:

overloadExists (which I implemented in https://github.com/nim-lang/Nim/pull/12076); avoids all the problems of compiles()
alias PR (https://github.com/nim-lang/Nim/pull/11992), plus a implemented but not yet pushed commit that allows referring to other parameters in the declaration thus avoiding need for auto), eg: proc default2(T: alias): T {.enableif: T is type.} = discard
{.enableif.} (https://github.com/nim-lang/Nim/pull/12048)

not yet implemented syntax sugar on top of {.enableif.} that allows using the same syntax as concepts:

proc sum(a: auto): ElementType(a) {.enableif: iterable(a).}
# => with sugar:
proc sum(a: iterable): ElementType(a)

"Yes this really only requires an IEnumerable (or is that an IEnumerable<IEnumerable> or an IEnumerable<IEnumerable>?)

proc sum1(a: iterable[string]): string = ...
proc sum2(a: iterable): ElementType(a) = ...

proc foo(a: iterable, b: iterable[ElementType(a)]): seq[ElementType(ElementType(a))]

# 1-off arbitrary custom conditions:
proc foo(a, b: auto): foo(a,b) {.enableif: a.type.size == b.type.size .} = ...

note also that, unlike concepts, enableif doesn't subvert the type and is a transparent abstraction once the enableif condition is evaluated:

# with concepts:
type Foo = concept x: x.len
proc main1(x: Foo): string = $x.type
echo main1(@[1,2]) # Foo instead of seq[int]

# with enableif + not-yet-implemented concept-like syntax sugar:
template Foo(x): bool = overloadExists(x.len)
proc main2(x: Foo): string = $x.type
echo main2(@[1,2]) # seq[int]

links

[1] http://ddili.org/ders/d.en/ranges.html
https://dlang.org/phobos/std_range.html

Ranges generalize the concept of arrays, lists, or anything that involves sequential access. This abstraction enables the same set of algorithms to be used with a vast variety of different concrete types. For example, a linear search algorithm such as find works not just for arrays, but for linked-lists, input files, incoming network data, etc.

it allows specializing algorithms depending on what the input offers, eg:

input ranges (for online algos, eg stdin stream, you don't know the length) => requires empty, front and popFront()
random access ranges (still more general than openArray, eg doesn't require contiguous memory layout, so supports being backed by a db, a rope, a table, etc)
a view of a seq/array (approximately an openArray)
bidirectional ranges => BidirectionalRange additionally requires back and popBack(). => eg for circular buffers, deques

Araq commented 4 years ago

I disagree, ...

Stop lecturing me on these things please, I know how it works in D-land quite well. I also know how it works in C#-land and I've no desire to bring even more "duck typing" into Nim.

c-blake commented 4 years ago

While it may be a more general API, "online by default" may not be all you want it to be in non-API dimensions like speed, accuracy, and memory.

I haven't checked lately, but for many years the GNU Scientific Library foisted this 2+ orders of magnitude slower online mean implementation in 80-bit long double onto unknowing users just calling mean. I once made some guy's average 160x faster (and "accurate enough" in context) changing to an avx256 vectorized summation implementation. These days avx512 might make it 320x faster. I would actually call such a crazy ratio a performance footgun.

Everything we add generates "why is it not best in class?" kinds of questions. Solving a problem badly or making it the default can be worse than not solving it at all. Not always, I know, perfect enemy of the good and all that. But this is Nim. The stdlib does not need to solve all problems. We already have stats.RunningStat. Maybe improve that along the lines you want? We don't need to locate/name things the same way as D/Julia/Python. Or try it out in a separate package/module instead of foisting it upon everyone?

cooldome commented 4 years ago

I think we have iterators already and we just need make them slightly more flexible. Support of passing iterator as as argument to another iterator would solve it.

Agree with @c-blake, sum of opearray[T] can be computed very efficiently and this is why it is the default. You want to compute sum of the elements coming from stream. This hard because to do it effeciently you need buffering and what buffering depends on the access pattern. It is better to leave it to the user instead of pretending you can do it in one line.

timotheecour commented 4 years ago

sum of opearray[T] can be computed very efficiently and this is why it is the default

I don't follow your logic.

if user passes a seq/array/openarray, there is no overhead between sumNaive1 and sumNaive2 below, and both will equally auto-vectorize using optimization flags, try it.
if user passes an iterator, sumNaive1 works but sumNaive2 won't.

So you never need sumNaive2(openArray), only sumNaive1(iterable)

proc sumNaive1(a: iterable[T]): T =
  for ai in a: result += ai 
proc sumNaive2[T](a: openArray[T]): T =
  for ai in a: result += ai

Likewise with the more accurate sum variants in std/sums:

proc sumShewchuck*[T: SomeFloat](a: iterable[T]): T =
  var state = initShechukState[T]()
  for ai in iter: add(state, ai)
  peekFinal(state)

And likewise with many more algorithms.

Support of passing iterator as as argument to another iterator would solve it.

https://github.com/nim-lang/Nim/pull/11992 solves this

c-blake commented 4 years ago

As written now, std/sums.sumsPairwise specifically requires random access to the whole array for its recursion, anyway. The likely ~~faster~~ much faster stack-based variant may be better about this specific problem, but you need to be careful. FP error accumulation depends upon cancellation which depends upon order and order is a fundamentally global/non-online idea. So, I think robust sum algorithms are a bad example of making the stdlib more online/lazy (which I agree with @Araq is generally good).

timotheecour commented 4 years ago

As written now, std/sums.sumsPairwise specifically requires random access to the whole array for its recursion, anyway

the way sums.sumsPairwise is implemented via recursion is inefficient, and it's easy to write it as an online algorithm that requires O(log(n)) space, no recursion, and better performance; it doesn't require random access and each element can be accessed just once (may be same as stack-based variant you're alluding to). In fact it's a good interview question.

And it will give exactly the same result as the current sums.sumsPairwise.

So, I think robust sum algorithms are a bad example of making the stdlib more online/lazy

I disagree; so far every algorithm in std/sums can indeed be written in online fashion, with less memory usage, no worse and likely better performance for large n, and identical results: sumPairs, sumKbn, sumShewchuck (https://github.com/nim-lang/Nim/pull/16004)

Maybe you can come up with some summation algorithm that can't be written in online fashion (I'm curious which one?), then obviously you'll need openArray API for those, but this RFC concerns algorithms that are intrinsically online; for those, the openArray variant is entirely redundant (performance and accuracy wise), and only the online one is needed (iterable accepts openArray input)

c-blake commented 4 years ago

Yes, I held back linking to the Wikipedia article in the doc comment giving the stack based algo which sounds like what @timotheecour (TC) is also thinking of, and I already said it would be much faster. The wiki pseudocode for the stack way does not unpack the last 7-ish recursion levels like the 128 in the current code, also important for performance but reducing accuracy.

TC's evaluation of good ideas here seems to be based on what he, personally, has seen "so far" being "onlinable" and maybe ulterior motivations. My point here was that the "idea" here of robust summing is not, in fact, "intrinsically online". That follows from the uncontentious non-commutativity of FP arithmetic alone. APIs tracking ideas works best.

Another way in which robust summing is not online is that the way to achieve the best error bound/parameters of said algo depends upon how many intermediate partial sums are expected a priori. Unless the sum in question is truly exact, it should really provide an error bound to the caller. Such bounds also probably need-ish the number of floats. No error bound for the caller is a far bigger API problem than online-ness. O(log(n)) is hopelessly imprecise if the set is so ill conditioned that the inexact answer is still garbage. But there are exactSum() algos, too.

Per TC's curiosity, there is a large space of algos, some of which require pre-sorting arrays like Kahan’s cascaded-compensated summation (ref in that last chapter link). I think we can agree sorting would want the openArray API (and maybe a var openArray variant). There is also the Zhu2010 paper that I already mentioned the day before his comment which builds an online exact sum for horribly conditioned arrays from a non-online exact sum iFastSum(Zhu2009) as a primitive and four accumulators per binary exponent (4*2048*8 bytes=64 KiB). Granted, Radford2015 argues for simpler than iFastSum-style techniques, but even that online algo wants to know how many terms there are to decide superaccumulator size. openArray's .len provides guidance there (alluded to "parameter of the algo").

That last link also argues for openArray a 3rd way. It argues online exactness allows you to more soundly go multi-core/any order. There you would need the .len (as well as available CPU core count, etc.) to split the data. (Maybe 128x better, but more likely memory bandwidth bound). So, you would still want an "outer API" that was probably openArray-based, still netting you both APIs, not one or the other as per the debate so far.

So, look - robust summation is a very specialized area that is A) definitely not intrinsically online { order, error bounds/scale dependence of method, parallelism }, B) not currently used by anything in the stdlib, C) OR by even any package in the nimbleverse that I could tell from a grep -iw sums **.nim after almost 2 years in stdlib and D) I agree could profit from interface iteration with less backward compatible sensitivity (which seems to be what TC is asking for in this RFC, though just adding new online interfaces is not incompatible).

Personally, I'd say this all argues for lifting std/sums into the nimbleverse, obsoleting this RFC. I earnestly believe people will eventually want both online & random access overloads for robustSum() or exactSum() for stated reasons (or maybe others!) and definitely would want error bounds for "only robust" sums, but I think a package author could iterate on this with himself/his users faster/better. It seems @lbartoletti almost did his own package at first, anyway.

Araq commented 4 years ago

Spoon feeding a sum algorithm with numbers because then you might be able to avoid a temporary seq allocation is not even "premature optimization", it can seriously harm correctness.

c-blake commented 4 years ago

There are contexts (stats.RunningStat, by @jlp765 which I already mentioned) where A) accuracy of running totals might matter and B) higher level interfaces (RunningStat.push) require one-by-one.

To me, though, this only argues for additional online APIs, not eliminating the (according to TC) "entirely redundant" openArray API. I do not think it is redundant at all, as explained, and onlineness restricts implementation choices in a way that arrays do not.

Total elimination seems like frivolous backward incompatibility (which as mentioned, I looked into and may impact zero people, but who knows what invisible code may be out there). If you want to go that way, then I say go all the way and lift it into a package.

If you do not want to go that far, then stats could maybe grow (and maybe use) RunningRobustSum (or maybe just robustSum with an incremental interface). stats might become more accurate for some inputs. Estimating/reporting errors to the caller should not be neglected for any inexact method whose main point is to tame inexactness. A large distrust of input data is already clearly in the client code's thinking. And, ideally, various enum-selected methods should be available with some suitable default. It also might be better to get experience with that as a package first since I cannot find any complaints about RunningStat accuracy.

lbartoletti commented 4 years ago

It seems @lbartoletti almost did his own package at first, anyway.

Not yet, but soon.

timotheecour commented 3 years ago

now that we have iterable[T] (replacing untyped for called iterators), this RFC just got easier; see also https://github.com/timotheecour/Nim/issues/746 and https://github.com/timotheecour/Nim/pull/747

nim-lang / RFCs