taoensso / faraday

Amazon DynamoDB client for Clojure
https://www.taoensso.com/faraday
Eclipse Public License 1.0
238 stars 84 forks source link

Support for BigDecimal #26

Closed tlipski closed 10 years ago

tlipski commented 10 years ago

Right now, the suggested way to persist BigDecimal with faraday is to use nippy serialization. This of course clashes with a need to access the data not from faraday library.

One quick fix that I came up with is altering taosensso.faraday/simple-num? function to return true for BigDecimals as well:

(defn- simple-num? [x] (or (instance? Long       x)
                           (instance? Double     x)
                           (instance? Integer    x)
                           (instance? Float      x)
                           (instance? BigDecimal x)))

As BigDecimal is supported by the DynamoDB's Java driver, the only thing we might lose with that approach is BigDecimal's precision or rounding mode - but I suppose that lose that with PostgreSQL for example as well.

ptaoussanis commented 10 years ago

Hey Tomek, thanks for the PR!

As BigDecimal is supported by the DynamoDB's Java driver, the only thing we might lose with that approach is BigDecimal's precision or rounding mode

Hmm, I'll be honest - I'm not sure if I like the idea of doing this (or the Java driver's choice, btw). My thinking:

  1. If someone is using a BigDec/BigInt/etc. they're probably doing so because they require the extra precision.
  2. If they don't, it's pretty easy to trim with (double _) or (long _) and get a simple-num type.

So as currently implemented, we'll never silently drop precision. We don't prevent dropping precision, but do require it to be done explicitly.

Does that make sense? Do you disagree? It'd help if you could explain a little why you think this'd be a useful change (I might well be missing something). Were you caught unexpectedly with a Nippy-serialized value when you were expecting a truncated BigDec?

Cheers :-)

tlipski commented 10 years ago

Hey, thanks for the quick response! :)

In my case, I am using BigDecimal not to have a certain precision or rounding mode, but rather to avoid possible inaccuracies coming from floating-point number representation of the data.

Still, I am not really hung up upon BigDecimal specifically - I just need something that will keep accurate decimal value persisted and bigdec seems like the only reasonable option in Java.

Of course, I understand that switching to BigDecimal might not be a good way for everyone, as it won't be unboxed outside of Clojure, etc. DynamoDB guys are sticking with String for numbers in the SDK and I can easily see why.

So - one thing that might help would be a way to enhance type serialization/deserialization directly in faraday - but in a more gentle way than having to overwrite private functions in the namespace.

The other issue, is that Double is always returned for numeric types which contain decimal point - possibly loosing some accuracy at this point:

(defn- str->num    [^String s] (if (.contains s ".") (Double. s) (Long. s)))
ptaoussanis commented 10 years ago

to avoid possible inaccuracies coming from floating-point number representation of the data.

Okay, gotcha! That's completely reasonable - thanks for clarifying.

I understand that switching to BigDecimal might not be a good way for everyone

No, I think your concern is quite legitimate. Since DynamoDB is storing numbers as exact-value strings (at least w/in their precision limits) - it might be more appropriate for us to duplicate this behaviour Clojure-side. I'd forgotten that that's how they were storing numbers; thanks for the reminder.

So - for the record - how would you feel if Faraday were modded to accept BigDecimals directly (as per your PR), and to always return BigDecimals rather than Doubles (as per your point above)?

This of course still leaves the question: what about folks that are using a BigDec because they want > 38 digits of precision. Though that's arguably a (much?) rarer use-case than folks wanting non-float semantics...

I'd like a little time to think about this. Any additional feedback in the meantime would be very welcome! Please keep your PR open for the moment, and I'll come back to you.

Cheers! :-)

tlipski commented 10 years ago

With more than 38 digits of precision, it's the DynamoDB that limits them - so probably store as a string or serialize using nippy for example. Using double will not help at all in this case from what I've tested:

(faraday/put-item CREDS :test1 {:key "1" :value (double 1.00000000000000000000000000000000000000002)})
(faraday/get-item CREDS :test1 {:key "1"})
=> {:value 1, :key "1"}
ptaoussanis commented 10 years ago

Using double will not help at all

No, no - that wasn't my meaning. To clarify:

Current behaviour

Proposed behaviour

So we'd be swapping the set of tradeoffs. The most significant new downside would be that folks who were using BigDecimals because they specifically wanted very high precision are going to have a breaking change: previously they didn't lose precision (because the vals were being serialized), but under the new scheme they can lose precision (because the vals will be limited by DynamoDB's number format).

As I said, I think it's rare for anyone to need more than 38 digits of actual precision - so this may not be a big deal in practice, but it's worth thinking about.

Does that make sense?

tlipski commented 10 years ago

Hmm, are you certain about the automated serialization with Nippy? Because running the following code with faraday "1.0.2" throws an exception:

(faraday/put-item CREDS :test1 {:key "1" :value 1.3M})
Exception Unknown DynamoDB value type: class java.math.BigDecimal. See `freeze` for serialization.  taoensso.faraday/clj-val->db-val (faraday.clj:160)

This happens, because BigDecimal does not satisfy tests for stringy?, simple-num? and finally freeze?, which seems to check if the value is byte array or (instance? WrappedForFreezing x) - but I can misinterpret the code here (even though it is very clear and readable!).

On the other hand, I am wondering if silent auto-serialization of BigDecimal with Nippy would be such a good thing here - Clojure developer might not notice that (no error, no warning), store lots of data in the DynamoDB. Then, months later, a Hadoop developer would come and try to see the actual value

ptaoussanis commented 10 years ago

Hmm, are you certain about the automated serialization with Nippy?

Not at all :-) Excuse me, I'm not actually using Faraday myself atm and I misremembered (thought it was auto serializing non-native types). You're 100% correct.

I am wondering if silent auto-serialization of BigDecimal with Nippy would be such a good thing here

Completely agreed.

Have just pushed a dev commit for your review: https://github.com/ptaoussanis/faraday/commit/47b5d2557239a966c9ea77d925fe4f4730e38f88 Please let me know what you think. Again, I'm not actually using Faraday atm - so pointers here are very helpful+welcome!

Cheers :-)

ptaoussanis commented 10 years ago

@paraseba, @joelittlejohn - would appreciate your feedback too on this commit: https://github.com/ptaoussanis/faraday/commit/47b5d2557239a966c9ea77d925fe4f4730e38f88

Changes:

Motivation:

Benefits:

Costs:

Any feedback welcome, no rush! Cheers :-)

tlipski commented 10 years ago

Looks great. I'm just wondering if we shouldn't check for precision in BigDecimal and throw an error if it is greater than 38 digits? I can implement that this evening - no problem here and the faraday library has been a real boost to my productivity with dynamodb anyway.

Naturally, DynamoDB by itself silently drops 39-th and following digits after decimal point.

ptaoussanis commented 10 years ago

Looks great. I'm just wondering if we shouldn't check for precision in BigDecimal and throw an error if it is greater than 38 digits?

Another nice idea! Done: https://github.com/ptaoussanis/faraday/commit/f6105d9e3447a406177685529bd87394f585c66b

tlipski commented 10 years ago

Awesome, thank you!

ptaoussanis commented 10 years ago

Just pushed 1.3.0-RC1 which includes these changes, closing. Please let me know if anyone has any issues! Cheers :-)