remeda / remeda

A utility library for JavaScript and TypeScript.
http://remedajs.com
MIT License
4.47k stars 136 forks source link

Add `mean` and `median`? #874

Open grigorischristainas opened 1 week ago

grigorischristainas commented 1 week ago

It would be helpful to have these utilities, equivalent to https://ramdajs.com/docs/#mean and https://ramdajs.com/docs/#median.

If you support their addition, I would be happy to contribute.

eranhirsch commented 1 week ago

Sounds good to me, take a look at sum and product for examples on how to think about all the edge cases. I'm wondering if undefined would be a better return value than NaN as it forces typechecking

grigorischristainas commented 1 week ago

@eranhirsch I noticed that meanBy returns NaN in case of an empty array. Maybe it would be better to return NaN in the new utilities as well for consistency? This could also be helpful when migrating from Lodash or Ramda, as their corresponding utilities return NaN too.

cjquines commented 1 week ago

i dislike it, but i'd vote for NaN for consistency. builtins like parseInt or parseFloat return NaN when something isn't parseable, rather than undefined

if we really wanted to, we could do something like

declare const TAG_NAN: unique symbol;

type NaN = Tagged<number, typeof TAG_NAN>;

and make everything return number | NaN instead. but i also don't like this. it would mean having to change the types of everything that returns a number

eranhirsch commented 1 week ago

Lodash and Ramda are effectively untyped, as is the Javascript (ECMAScript) standard library; with Remeda being strictly typed, we should reconsider these decisions. Our main goal with better typing is to move errors from runtime to compile time; returning NaN is an example of a value that is not valuable to users and probably means something went wrong, but because it's typed as number, it is hard to detect at compile time and protect against.

Javascript has a lot of legacy decisions that don't make sense in retrospect; for example, 3/0 is Infinity but 0/0 is NaN (?!), although the math says it should be undefined (https://en.wikipedia.org/wiki/Division_by_zero). NaN is an artifact of double precision floating point where the APU has no other means of "throwing an exception" other than turning on bits of the output that effectively say that the value returned is not the result of the operation. It is not a mathematical concept. Javascript has multiple values that all mean "not a regular value": undefined, null, NaN, and this makes code brittle and confusing to maintain. There is already a movement to stop using null and I feel that NaN should be put in the same bucket, especially since it isn't typed differently (at least null is a concrete type).

Returning undefined makes more sense because it allows chaining and coalescing, whereas the branded solution suggested by @cjquines above doesn't. It's also more accurate from a mathematical perspective, because it is literally undefined what the mean of an empty set of datapoints is (as a reduction to what the value of x/0 should be).

const result = mean(myDataPoints) ?? 0;

// vs
const temp = mean(myDataPoints);
const result = typeof temp === "number" ? temp : 0;

I'm not that concerned about having different semantics from Lodash or Ramda (returning undefined instead of NaN) as people migrating shouldn't consider Remeda as a drop-in replacement (we don't try to be), and getting the undefined return type would force them to realize that their code might have had cases where it didn't handle the NaN cases previously. We offer hasAtLeast as a way to work around this.

eranhirsch commented 1 week ago

meanBy is an old function that was added before I started reviewing changes to the library. Because of limitations to how our lazy evaluation mechanism works, we couldn't deprecate it with the release of v2 (together with sumBy), but that is the long-term plan (as it is composable via map and sum). Because my plan is to eventually remove it I didn't refactor it or change it's API when working on v2.

grigorischristainas commented 1 week ago

Thank you both for your feedback! While I initially had some concerns about using NaN for consistency, I appreciate the reasoning for using undefined. It clearly indicates that certain cases should be handled more explicitly and forces proper type-checking.