pbeshai / tidy

Tidy up your data with JavaScript, inspired by dplyr and the tidyverse
https://pbeshai.github.io/tidy
MIT License
738 stars 21 forks source link

Reduce floating point errors #2

Closed veltman closed 3 years ago

veltman commented 3 years ago

I noticed that there's a floating point error in one of the Tidy.js notebook examples:

image

This switches sum, cumsum, and mean from the d3-array versions to the "improved Kahan-Babuška" algorithm described here to reduce some floating point errors.

This is probably dumb and my types are definitely dumb but if it's of interest let me know and I can also push a corresponding docs update.

pbeshai commented 3 years ago

Sure let's do it! I wonder about if there's a better way for tree-shaking than having it all under TMath, which previously only included the very lightweight rate() function. Also wonder if they should be publicly visible or not, can't decide (currently are as they'd be under TMath). Seems useful but would basically always be used via tidy(data, sum('foo')). What we should do is just get d3-array to use them ;) Anyway, I'm game to add them and export them. Unless you have a better idea, we can keep it under TMath and also add docs on the math.md file.

If you can update the docs to mention the fancy algo on the summarizer's that would be great too

veltman commented 3 years ago

Tree-shaking wise, why would the file location and export status matter? Won't they still get shaken if someone only imports some unrelated functions?

veltman commented 3 years ago

Also wondering it makes more sense to have three separate functions that are mostly doing the same loop, or have them each call an internal function that computes all three stats and returns one of the three - the latter might be more DRY but a little more esoteric and slower

pbeshai commented 3 years ago

re: Tree-shaking, my weak understanding is if you import TMath then you'll get the code for sum, cumsum, etc even if you only use TMath.rate()... but not really sure of a better way.

Separate functions are fine too

veltman commented 3 years ago

ah I misunderstood the shape of the exports - I'd vote for keeping these internal for now, I can move them to helpers

veltman commented 3 years ago

Moved to helpers, wrote some extra tests, added links to the docs.

Fil commented 3 years ago

Did you notice d3-array has fsum? https://github.com/d3/d3-array#fsum

pbeshai commented 3 years ago

@Fil knower of all things, now don't we look silly. from now on all PRs flow through @Fil

Fil commented 3 years ago

(d3.cumsum doesn't use fsum though)

pbeshai commented 3 years ago

the more d3-array code we use the better. happy to switch to adder / fsum and reduce code in this codebase. suppose we'll need a cumsum / mean implementation that leverages Adder but should be relatively short

veltman commented 3 years ago

I didn't know about fsum! (clearly) I'll submit an update

Fil commented 3 years ago

https://observablehq.com/@fil/fcumsum seems to work