will / crystal-pg

a postgres driver for crystal
BSD 3-Clause "New" or "Revised" License
462 stars 77 forks source link

fix Numeric#to_f overflow #223

Closed will closed 3 years ago

will commented 3 years ago

fixes #222

This fix changes the numerator to calculate as a float, which will eventually lose even more precision than before, but maybe it's okay.

Keeping this as a draft for a bit because when looking into this I found out that the postgres server converts numerics first to a string, then parses that string to a float to do that conversion. And so it's probably worth it to explore that and maybe some other solutions first.

will commented 3 years ago

Doing it the way postgres does it is much slower and does use a ton more memory

require "./src/pg"

require "benchmark"

def n(nd, w, s, ds, d)
  PG::Numeric.new(nd.to_i16, w.to_i16, s.to_i16, ds.to_i16, d.map(&.to_i16))
end

module PG
  struct Numeric
    def to_str_f
      to_s.to_f64
    end
  end
end

NUMS = [
  n(0, 0, -16384, 0, [] of Int16),
  n(0, 0, 0, 0, [] of Int16),
  n(0, 0, 0, 1, [] of Int16),
  n(1, 0, 0, 0, [1]),
  n(1, 0, 0x4000, 0, [1]),
  n(2, 0, 0, 1, [1, 3000]),
  n(2, 0, 0, 2, [1, 3000]),
  n(4, 1, 0, 7, [1, 2345, 6789, 1230]),
  n(1, -2, 0x4000, 5, [9000]),
  n(1, -2, 0x4000, 6, [900]),
  n(1, -2, 0x4000, 7, [90]),
  n(1, -2, 0x4000, 8, [9]),
  n(2, -10, 0, 43, [9999, 9990]),
  n(1, 1, 0, 0, [80]),
  n(2, 1, 0, 0, [5, 93]),
  n(3, 2, 0, 0, [5, 0, 93]),
  n(1, -1, 0, 1, [3000]),
  n(1, -1, 0, 2, [300]),
  n(1, -1, 0, 3, [30]),
  n(3, -1, 0, 9, [3, 0, 3000]),
  n(1, -2, 0, 13, [60]),
  n(4, 1, 0, 8, [5, 93, 6075, 4417]),
  n(5, 0, 0, 16, [2566, 1918, 9050, 0, 2000]),
]

Benchmark.ips do |x|
  x.report("to_f") { NUMS.map(&.to_f) }
  x.report("to_str_f") { NUMS.map(&.to_str_f) }
end
    to_f   2.28M (439.27ns) (± 0.59%)    224B/op        fastest
to_str_f 245.28k (  4.08µs) (± 1.31%)  5.78kB/op   9.28× slower

There are probably some tricks to make it less expensive, but it'd be hard to bring it down to same ballpark I think.

However doing repeated summations on a float is worrying me somewhat that this is the classic example of compounding errors.

will commented 3 years ago

I experimented with a few different ways, and settled on one. It may turn out that the more correct thing to do is to look at how many digits there are, and the approximate magnitude and use a different approach for each situation. However, this is probably good enough for now, and we can look into this more in the future if it's actually a problem.

fdr commented 3 years ago

Maybe it'd be instructive to see what numerical solution some libc are doing. The only difference is radix 10 vs radix 10000