tvondra / sequential-uuids

generator of sequential UUIDs
https://blog.2ndquadrant.com/sequential-uuid-generators/
MIT License
302 stars 18 forks source link

Documentation about Install and usage #3

Closed bwakkie closed 5 years ago

bwakkie commented 5 years ago

The documentation left me puzzled about how to use it in practice: [documentation] Prerequires: postgresql-server-dev-11 Install: git clone https://github.com/tvondra/sequential-uuids.git cd sequential-uuids/ make sudo make install inside PostgreSQL: CREATE EXTENSION sequential-uuids; [/documentation]

The really puzzling part from my side is that I have no clue what to do next in order to get a sequenced UUID into my table!

db=# CREATE TABLE names(id uuid, name varchar(40), PRIMARY KEY (id) ); db=# INSERT INTO names VALUES (uuid_generate_v4(),'Abraham Lincon'); INSERT 0 1 db=# select * from names; id | name --------------------------------------+---------------- 1a9d00a2-670f-4a8a-8c8b-eea45952411a | Abraham Lincon (1 row)

db=# INSERT INTO names VALUES (uuid_sequence_nextval('1a9d00a2-670f-4a8a-8c8b-eea45952411a'),'Some Otherdude'); ERROR: relation "1a9d00a2-670f-4a8a-8c8b-eea45952411a" does not exist LINE 1: INSERT INTO names VALUES (uuid_sequence_nextval('1a9d00a2-67...

Or it the first uuid inserted in a wrong way? 00000000-0000-0000-0000-000000000000 doesn't work either.

bwakkie commented 5 years ago

I got a bit further:

[EDIT: fixed a typo: tablename_colname_seq -> tablename_names_seq to point to the just created SEQUENCE in the sequence_nextval() call obviously]

DROP SEQUENCE IF EXISTS tablename_names_seq; DROP TABLE IF EXISTS names; CREATE SEQUENCE tablename_names_seq AS integer; CREATE TABLE names(id uuid NOT NULL DEFAULT uuid_sequence_nextval('tablename_names_seq'::regclass),name varchar(40)); INSERT INTO names (name) VALUES ('William McKinley'),('Theodore Roosevelt'),('William Taft'),('Woodrow Wilson'),('Warren Harding'),('Calvin Coolidge'),('Herbert C. Hoover'),('Franklin Delano Roosevelt'),('Harry S Truman'),('Dwight David Eisenhower'),('John Fitzgerald Kennedy'),('Lyndon Baines Johnson'),('Richard Milhous Nixon'),('Gerald R. Ford'),('James (Jimmy) Earl Carter, Jr.'),('Ronald Wilson Reagan'),('George H. W. Bush'),('William (Bill) Jefferson Clinton'),('George W. Bush'),('Barack Obama'),('Donald Trump'); ...will create in my case: db=# select * from names; id | name --------------------------------------+---------------------------------- 00005b41-8b07-3daa-216e-30b3e8177705 | William McKinley 00004ce4-1183-7689-47a0-a56d7e8e987c | Theodore Roosevelt 00005143-7fad-840b-b4c1-b5d52fe58817 | William Taft 0000fcff-1c48-e32e-cb59-b713fa5c8078 | Woodrow Wilson 0000ea18-f43c-5b74-e9df-7f9da03473d0 | Warren Harding 000019fb-e716-fb04-5ede-322a38e93d32 | Calvin Coolidge 000045bd-aa30-d59f-6c31-13551092f2b1 | Herbert C. Hoover 0000c665-81df-6168-f55c-6c543a9e7e72 | Franklin Delano Roosevelt 000087bb-a4cd-784f-fd4d-ee697e01be8f | Harry S Truman 000093b0-4059-16c1-3877-292ed396820d | Dwight David Eisenhower 00003400-80bc-bb24-8933-73868061efff | John Fitzgerald Kennedy 000062ad-8ef5-5dce-4e73-8f87eab8b5bd | Lyndon Baines Johnson 00004e37-cb83-374b-3ff2-6fc825e34ea5 | Richard Milhous Nixon 0000443d-a4a7-ea32-9c47-00ebbb8f72a5 | Gerald R. Ford 00004827-6396-5e2e-1995-795887e820ac | James (Jimmy) Earl Carter, Jr. 0000cb6e-5110-abf6-b795-2853dd293e98 | Ronald Wilson Reagan 0000b8b0-3d00-d7a0-9735-ceb0ca470951 | George H. W. Bush 00003029-fdfb-984f-0b43-45c2d96d16b6 | William (Bill) Jefferson Clinton 00009654-4e4f-058b-4fdc-2ce612fa97dc | George W. Bush 000042a0-2e72-c92b-6d61-7a79a5bf3b7e | Barack Obama 00002d51-34c3-a682-12ab-0d6287394899 | Donald Trump

So in this example how can I find this sequence?

bwakkie commented 5 years ago

I expected the first 4 digits (0000) to be filled by the sequence!?

tvondra commented 5 years ago

I was neglecting this issue far too long, sorry about that. I blame summer.

In your first message, you seem to be confused by the first parameter of the uuid_sequence_nextval function. In PostgreSQL, regclass is a special data type representing a database object of sequence type. So what you need to pass is the name used in CREATE SEQUENCE command, not some value read from that sequence (or generated in some other way).

So something like this

CREATE  SEQUENCE my_sequence;
SELECT uuid_sequence_nextval('my_sequence');

you can also use an explicit cast from text to regclass

SELECT uuid_sequence_nextval('my_sequence'::regclass);

but that's not strictly required - PostgreSQL realizes the function expects regclass parameter and does the cast automatically (it might be necessary with overloaded functions, but that's not the case here).

Judging by your second post you've figured this out on your own, I'm just explaining why it's like this.

2) If I understand your example correctly, you expected to see the first two bytes (four chars) in the UUID values to increment for each new value, i.e. something like

0000...
0001...
0002...
0003...
...

That's not how it's supposed to work.

Before explaining how it works, let me first briefly explain what the trouble with "random" UUIDs is. Let's say the we have a table with an index on an UUID column, and that those objects are large (tens of gigabytes or more), and insert 1M rows into the table. How many pages (8kB chunks) will be modified? For the table, it's mostly an append-only operation - we may fit e.g. 50 rows per page, so we'll modify 20000 pages, ~160MB of writes. Not bad.

For indexes, it's much worse, though, because it's not an appen-only operation. The values have to go to particular place in the index because (btree) indexes are based on ordering. With random UUID values we can assume the 1M values will touch ~1M pages, give or take, i.e. ~8GB of writes.

The goal of the sequential UUIDs is to reduce this write amplification by restricting the inserts to a small fraction of the index for a while, and then move to the next piece of the index. The function does that by dividing the UUID space into blocks (determined by a prefix of X bytes), and then moving to the next block after generating certain number of UUIDs.

With the default values, we have block_count = 65536, which means the prefix will be 2 bytes (because 256 * 256 = 65536, i.e. two bytes can represent exactly 64k values). We start with block 0000 (the rest of the UUID is random), and then increment the value to 0001 after generating block_size values (which is 64k by default too). This is why you've only seen 0000 in the data - you have not generated enough values to switch to the next block.

You might try this:

create sequence s;
select i, uuid_sequence_nextval('s', block_size := 1, block_count := 65536) from generate_series(1,20) s(i);
 i  |        uuid_sequence_nextval         
----+--------------------------------------
  1 | 0001caf6-9c39-25e4-4263-6a6f46610751
  2 | 000208be-c075-891c-7bd3-ea723f06b6a8
  3 | 0003758c-d440-8270-79a8-55bb0bbf2a51
  4 | 00042031-a229-ef62-9e79-7e194c688b8c
  5 | 00056e41-34e4-cd08-2450-789df8cd5803
  6 | 00068c82-55ad-b3f7-d6a3-5a741cd88d68
  7 | 00074118-f4af-5a28-9327-30b777a9546f
  8 | 000876ac-7303-2ec8-b0e2-bf868519faa1
  9 | 0009f287-0933-9ffe-e2f9-267621572d98
 10 | 000a0082-0876-2e7b-795d-43293f02afc4
 11 | 000b1ca9-650e-306e-41d0-6c23c99399ea
 12 | 000ceac7-83ea-498b-6077-06dad4490313
 13 | 000d4bb3-d767-5c3c-758d-abb65d17da26
 14 | 000eaa73-1194-3a94-7e83-1fdffb25b9cf
 15 | 000f6ebc-e3b9-6fba-21cc-f79659a24db6
 16 | 0010b927-dc64-9aed-f8d5-817758a05653
 17 | 0011c50f-2333-cb06-ed3b-c00e07b7a460
 18 | 001259f1-1613-18f2-77b3-e06f8861e6e0
 19 | 0013023c-34c7-4b57-fb17-5de8521df659
 20 | 0014d59a-b92e-8ccf-41a4-c1b857a128df
...

which does indeed increment the prefix for each value.

This however entirely neuters the benefits of sequential UUIDs, because it has the same issue as random UUIDs - it spreads the writes over the whole index. There's about 0% chance two UUIDs generated like this will hit the same index page.

With the default parameters, we essentially divide the index into 64k parts (by the first 2B prefix), each representing about 0.0015% of the index. With a 1TB index, that's about 16MB. And we direct 64k inserts into this block, before switching to the next one. 16MB has only about 2k pages, so this is guaranteed to reduce the write amplification (because instead of 64k pages we can only modify 2k of them).

By incrementing the prefix for every new value, the first one will hit the first 16MB block, the next one the next 16MB block, etc. That's pointless.

bwakkie commented 5 years ago

Super! Thanks for the explanation now I understand it... So what I was seeing in my second comment was actually correct it will just increment to 0001 after the next block which will take a while.

deedarb commented 3 months ago

Tried to install in macOS Sonoma (with arm chip) failed with message:

    sequential_uuids.c:90:8: error: call to undeclared function 'htobe64'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    val = htobe64(val);
tvondra commented 3 months ago

@deedarb This issue is closed, please open a new one. FWIW I don't have a macOS machine to test this on, but there's probably a header file missing. Can you check which header needs to be included for htobe64 to be defined?

deedarb commented 3 months ago

thanks for response, after some research I found that uuid v7 also solves task and probably will be available in future Postgres releases, as a temporary workaround I am planning to use this function https://github.com/Betterment/postgresql-uuid-generate-v7/blob/main/uuid_generate_v7.sql