taoensso / carmine

Redis client + message queue for Clojure
https://www.taoensso.com/carmine
Eclipse Public License 1.0
1.15k stars 130 forks source link

java.math.BigInteger is not classified as "simple numbers" #180

Closed mikehwang closed 7 years ago

mikehwang commented 7 years ago

Adding a value of type java.lang.Long and the same value of type java.math.BigInteger into a Redis set results in two separate members. See the example below:

user=> (require '[taoensso.carmine :as car :refer (wcar)]) 
nil
user=> (def conn {:pool {} :spec {:host "localhost" :port 6379}})
#'user/conn
user=> (defmacro wcar* [& body] `(car/wcar conn ~@body))
#'user/wcar*
user=> (def test-long 123)
#'user/test-long
user=> (wcar* (car/sadd "foo" test-long))
1
user=> (def test-int (int test-long))
#'user/test-int
user=> (wcar* (car/sadd "foo" test-int))
0
user=> (def test-bigint (java.math.BigInteger/valueOf test-long))
#'user/test-bigint
user=> (wcar* (car/sadd "foo" test-bigint))
1
user=> (wcar* (car/smembers "foo"))
[123 "123"]

Seems to me this was a conscientious design decision where java.lang.Long are considered simple numbers but java.math.BigInteger are not thus the latter values are being serialized as byte string and not as strings. (I'm also seeing the same issue with java.util.concurrent.atomic.AtomicLong)

Why was this decision made?

Both are numbers that derive from the same base class java.lang.Number and thus I think it makes sense to treat all child classes the same way.

ptaoussanis commented 7 years ago

Hi there,

Seems to me this was a conscientious design decision where java.lang.Long are considered simple numbers but java.math.BigInteger are not

That's correct. The distinction is motivated by Redis: Redis can specially store Long-sized integers so that they're directly manipulable as integers.

BigIntegers can extend beyond this range, so to write them to Redis we'd need to first check that they're within the safe range. But then the behaviour you'd see would depend on the range of the BigInteger: those small enough would be written and read as a long, and those larger would be serialized as a BigInteger.

I decided that the conditional behaviour would be more confusing and error-prone than just defining simple number types as instances of fixed-precision ints.

I'm also seeing the same issue with java.util.concurrent.atomic.AtomicLong

Carmine doesn't try to auto-coerce all possible types; that'd lead to a lot of confusion. Anything that's not a listed string type gets serialized.

If you'd like to force serialization of string types, you can use freeze. If you've got a non-string type that can be coerced to a long, you can suppress serialization by coercing to a long before giving the arg to Carmine.

Hope that helps!

mikehwang commented 7 years ago

Thanks for the reply!

Your motivations make sense and the simplicity is great but I'm going to argue that the interface is misleading and not Clojure-friendly. I'm going to preface the following with the fact that I'm a Clojure and Redis novice.

As a user of Carmine, I expect Carmine to adhere to Clojure's properties and do the necessary mapping from Clojure data types into Redis data types to preserve those properties. For example in Clojure, the following statement evaluates to true:

user=> (= (int 123) (biginteger 123))
true

But as stated in the issue, this is not preserved in Carmine. If I set two Redis keys of the same values where one is an integer and the other is a biginteger and do gets on them both and perform an equality check, the result is that they are not equal and thus unexpectedly different from Clojure.

To me a trade-off was made to support a Redis feature at the cost of the Clojure interface. I'm going to make a wildly unverified bold ridiculous claim that using Redis integer data type is not a widely used feature so shouldn't take priority.

One possibly naive solution is to uniformly serialize all numbers to Redis strings or their byte string representations and have an optional argument used as a switch to attempt to map the value to a Redis integer.

Another gotcha I discovered relevant to this is that float values are classified as simple numbers but can't be used as a Redis integer.

If you'd like to force serialization of string types, you can use freeze.

Thanks for pointing this out!

If you've got a non-string type that can be coerced to a long, you can suppress serialization by coercing to a long before giving the arg to Carmine.

Right on. So I would like to admit that I only ran into this issue because my application has a bug where there are couple places making calls to write to the same key. They are writing inconsistently - Integer vs BigInteger. I will be fixing this in my application.

However, I still also think inconsistent numbers support is an issue with Carmine for the reason above.

Thanks again!

ptaoussanis commented 7 years ago

Hi there, no problem :-)

Apologies for the terse reply, don't have much time-

I expect Carmine to adhere to Clojure's properties and do the necessary mapping from Clojure data types into Redis data types to preserve those properties.

It's not a general property of Clojure to promise that types will be auto-coerced for convenience, especially when the coercion may be lossy.

If I set two Redis keys of the same values where one is an integer and the other is a biginteger and do gets on them both and perform an equality check, the result is that they are not equal and thus unexpectedly different from Clojure.

This is a different problem and relates to intentional limitations in the design of Redis. Redis doesn't have real numeric data types (or indeed types); it's a byte-string store.

If you set a value to 5 and get it back, it'll be the string "5".

In your example: the long is being written unserialized and returned as a string, the bigint is being written serialized and so returned as a bigint.

You'll get Clojure's regular equality behaviour back if you make sure to coerce values back to their expected type when they're returned from Redis. That's the idiomatic way of using Carmine (indeed Redis with any client, in any programming language).

So why not just auto serialize everything on the way to Redis? Because that'd break Redis' ability to interpret and use these numbers internally. For example, incr would break; it can only work when the key's value is "5"- it won't understand a serialized long or bigint. Other commands that'd also break: anything accepting a timeout, index, position, length, ...

In short, there's an inherent non-bijective mismatch between Clojure and Redis data types. There exists no way of automatically handling this mismatch that'll work in all circumstances; whatever choice you make involves tradeoffs.

The current choice for Carmine is well thought out and presents a good balance of default tradeoffs.

Auto-serializing all numeric types would be a massive step backwards in a number of dimensions including performance, space efficiency, API compatibility, interop with other clients, and more.

I'm going to make a wildly unverified bold ridiculous claim that using Redis integer data type is not a widely used feature so shouldn't take priority.

That's an objectively incorrect statement. If you look over the Redis API, the number of commands that take an integer argument is huge.

Another gotcha I discovered relevant to this is that float values are classified as simple numbers but can't be used as a Redis integer.

Again, Carmine's matching the underlying data store here. Please see the documentation re: Redis data types. Trying to fight the underlying data store for the goal of some kind of subjective superficial fit with the language is a non-starter.

Hope that makes some sense? In any event, need to excuse myself from further discussions on this for other priorities atm.

Best of luck, cheers! :-)