yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
9.04k stars 1.08k forks source link

[YCQL] comparable TIMEUUID #801

Open patrick-li opened 5 years ago

patrick-li commented 5 years ago

Jira Link: DB-4712 There is a data type TIMEUUID. and two functions totimestamp() and tounixtimestamp. it acts like an unique timestamp. My question is whether it can be sorted/order by, >='2019-01-24T05:26:51' , <= '2019-01-24T05:26:51', or as clustering key.

Thanks.

m-iancu commented 5 years ago

@patrick-li A TIMEUUID (Uuid type 1: https://tools.ietf.org/html/rfc4122) is formed from a timestamp component (most-significant-bytes) and a node/clock component (least significant bytes). It will be sorted by the timestamp component first and then (assuming timestamp is equal) by the (typically random) node/clock component.

So it will be sorted as you expect on a clustering key, and it is also possible (but see details below) to "compare" with a timestamp value.

For the latter use-case, one must be careful though since multiple timeuuids can have the same timestamp value and then they will be compared by the clock/node component. That's why CQL provides two functions minTimeUUID(<timestamp>) and maxTimeUUID(<timestamp>) that should be used in the right hand side for such comparisons -- they generate corresponding timeuuids with min/max node/clock component so that they include all regular timeuuids with that timestamp in the comparison. See this test for an example of how to use it. E.g. something like:

SELECT * FROM test_minmax WHERE h = 2 AND r >= minTimeuuid('2019-01-25 00:34:32+0000') 
     AND r < maxTimeuuid('2019-01-25 00:40:47.758+0000');

Also see this comment for some more details.

Also, just to note since you mentioned them, totimestamp and tounixtimestamp are almost the same except that the first returns a value of type timestamp and the second of type bigint (but they logically represent the same value).

patrick-li commented 5 years ago

@m-iancu , thanks for quick response and reminding. You may update your online docs 'API Reference/YCQL/UUID & TIMEUUID'. A further question is whether there is a function, like this: TIMEUUID('2019-01-24T05:26:51.123'), so that when we insert a record with a timestamp we already have, we can directly insert a timestamp as TIMEUUID. An extra one is the Date ISO String supports 'Z', not only '+0000'. in Nodejs, date.toISOString() will generate a string like : '2019-01-24T05:26:51.123Z'. Hope Yugabyte can accept it sooner.

patrick-li commented 5 years ago

'minTimeuuid()' looks like what I want ? TIMEUUID = minTimeuuid() ?

patrick-li commented 5 years ago

Will twice call 'minTimeuuid()' return same TIMEUUID? minTimeuuid('2019-01-24T05:26:51.123') =?= minTimeuuid('2019-01-24T05:26:51.123')

m-iancu commented 5 years ago

@patrick-li Yes calling minTimeuuid twice should return exactly the same timeuuid. It should generally not be used to insert values for the following reasons:

  1. min/maxTimeUUID produce "pseudo-timeuuids" that do not technically conform to the RFC spec (i.e. clock/node component has some extra properties/restrictions too).
  2. Since they always produce the same clock/node component chance of collision is high (otherwise one could just use timestamp.
  3. They comparisons behavior may get confusing because now results may be weird when using e.g. <= vs < with minTimeuuid.

If I understand correctly, you needs something like the function now() but that actually takes a timestamp as an argument. Let me get back to you on that soon.

kmuthukk commented 5 years ago

I forked off this sub-issue into its own issue (#802), as a separate enhancement request.

patrick-li commented 5 years ago

no, not current time. but an existing timestamp like '2019-01-24T05:26:51.123'

patrick-li commented 5 years ago

I guess my requirement may break the TIMEUUID uniqueness. the length of the TIMEUUID is longer/stronger enough for a specific timestamp, no duplicate TIMEUUID can be generated or most likely. if there is a constructor with a timestamp, sometime the TIMEUUID will be duplicate. OK. Thank you guys.

kmuthukk commented 5 years ago

hi @patrick-li

a) I think it is possible for us to implement a builtin function/constructor for timeuuid which takes a timestamp and returns a unique timeuuid whose timestamp component is the specified timestamp.

It is just that we don't have such a built-in yet. Hence, filed #802 as an enhancement request. I couldn't find anything in JDK 8 also where you could do the same in the application and insert as a timeuuid into the database.

b) In the interim, want to understand your requirement better. Suppose your table is:

CREATE TABLE stock_profile(
       stock_symbol text,
       version timeuuid,
       details text,
   PRIMARY KEY ((stock_symbol), version))
   WITH CLUSTERING ORDER BY (version DESC);

and assume we had a builtin timeuuidFromTimestamp(), and recall that YCQL/Cassandra's INSERT really has UPSERT semantics by default on colliding primary keys.

So the question is, do you want:

INSERT INTO stock_profile(stock_symbol, version, details) 
    VALUES ('ORCL', timeuuidFromTimestamp('2019-01-25 00:40:47.758+0000'), 'details-1');
INSERT INTO stock_profile(stock_symbol, version, details) 
    VALUES ('ORCL', timeuuidFromTimestamp('2019-01-25 00:40:47.758+0000'), 'details-2');

to end up with 2 rows or 1 row (where the second one overwrites the first row because the primary key matches)?

I am guessing you want the above to end up as two different rows, correct?

patrick-li commented 5 years ago

Yes, exactly. CREATE TABLE test_table ( id1 text, id2 text, found_at timeuuid, details text, PRIMARY KEY ((id1, id2), found_at)) WITH CLUSTERING ORDER BY (found_at DESC);

Sometime, id1 + id2 + timestamp got to be duplicate. then missing record(s), only the last one saved. I wrote a local one = timestamp(UTC number) 100 + random() 100. if TIMEUUID works, that's the best.

kmuthukk commented 5 years ago

How about this instead, where you use a 2-part clustering column based on a timestamp (ts) and uuid (rand_uuid). See example below.

You could bind in a application generated uuid (using Java) as the tie-breaker for the rand_uuid column. [You could also use timeuuid type instead for rand_uuid column and use now() as the tiebreaker.]

CREATE TABLE IF NOT EXISTS test_table(
    id1 text,
    id2 text,
    ts timestamp,
    rand_uuid uuid,
    details text,
 PRIMARY KEY ((id1, id2), ts, uuid))
 WITH CLUSTERING ORDER BY (ts DESC);

But your queries wouldn't have to care about the rand_uuid column. They could simply be like:

SELECT * from test_table 
WHERE ts < '2017-07-04 12:31:30 UTC' 
AND id1='1' and id2='1';
patrick-li commented 5 years ago

cool. let me have a try.