mozilla / mentat

UNMAINTAINED A persistent, relational store inspired by Datomic and DataScript.
https://mozilla.github.io/mentat/
Apache License 2.0
1.65k stars 115 forks source link

[core] Implement BigInteger support in TypedValue #280

Open rnewman opened 7 years ago

rnewman commented 7 years ago

There are holes for this: we can parse queries involving BigInteger, but we can't yet represent them in the store.

ncalexan commented 7 years ago

This is almost identical to #201, and similar to #285, but we'll need to add a few things to the mix. Datomic has a :db.type/bigint specifically for large integers (see http://docs.datomic.com/schema.html), so let's copy them. You'll need to take care of:

  1. Adding a new DB_TYPE_BIGINT value to the bootstrapper;
  2. Bumping the SQL schema to accommodate the new ident;
  3. Add a new ValueType case -- let's use value_type_tag = 6 for big integers;
  4. Add the corresponding TypedValue cases;
  5. Implementing the conversions to and from SQL; Let's represent big integers as SQL TEXT in some large base, say base 36 (since that's the largest EDN base we support -- see #277. We could also represent big integers as SQL BLOB with some encoding as a byte array -- looking at our bigint type I see that the internal type is a "Vector of BigDigit instances" which is probably amenable to such a representation. But it's much easier to compare against text in the database, so let's do that to start.
  6. Testing the new types in the transactor and potentially in the query engine as well. This means adding conversion tests around https://github.com/mozilla/mentat/blob/1deed24f42847bc1eb0cfc26bada840643eaec33/db/tests/value_tests.rs#L25.

@jsantell, you have experience in this area and can either take this ticket or mentor it.

ncalexan commented 7 years ago

@rnewman do you have a comment about the value_type_tag = 6 or the on disk representation? I don't think it makes sense to overload value_type_tag = 5 since the representations of :db.type/{long,double} and :db.type/bigint aren't directly comparable.

rnewman commented 7 years ago

In one direction it would be sufficient to reuse the type tag: a SQL TEXT is distinguishable from floats and integers, so we cool on the way out.

SQLite might be able to just make this work; it does have some understanding of BIGINT. I'll do some digging — if we can just use 5 and coerce on blob on the way out, and fiddle things correctly in the query engine, then that would be ideal.

ncalexan commented 7 years ago

SQLite might be able to just make this work; it does have some understanding of BIGINT. I'll do some digging — if we can just use 5 and coerce on blob on the way out, and fiddle things correctly in the query engine, then that would be ideal.

I wasn't aware that there was BIGINT support in SQLite -- if there is, and SQLite can do the tricky comparisons across storage classes, that would be by far the best for the query engine.

rnewman commented 7 years ago

Firefox crashed and lost my comment, yay.

I don't think we can get SQLite to do what we need here, particularly correct ordering of negative bigints, even if we ensure that we don't store small bigints.

To do something as simple as an aggregate or inequality operation on a value space of bigints, or worse bigints and other numbers, will involve writing SQLite extension functions. As such I suggest punting on this ticket for several months: don't support bigints for now.

rnewman commented 7 years ago

To be clear: we could probably store them. It's getting them to work correctly in queries that's the hard part.

rnewman commented 7 years ago

Back at a proper keyboard:

SQLite sorts numerics before text before blob.

It cannot store integer numerics larger than an i64.

You can encode numerics as a blob or a string, but then they will be sorted in a different bucket, and you won't be able to use <. Indeed, even if you try coercing to a numeric, you'll get Inf, or a real, or an integer, depending on how coercable the string representation is and whether it's in range.

To implement this ticket, you would need to do the following to support transacting and storing bigints:

and the following to support querying bigints: