googleapis / ruby-spanner-activerecord

Spanner<->ActiveRecord adapter for Ruby
MIT License
86 stars 28 forks source link

The primary key using Bit Reversed ID generates a sequence that significantly deviates from the start_with_counter #302

Closed ta1kt0me closed 3 months ago

ta1kt0me commented 3 months ago

When using Bit Reversed ID for the primary key in ruby-spanner-activerecord with Cloud Spanner, I encountered an issue where the generated values significantly deviate from the values specified with start_with_counter. It seems that using Bit Reversed ID for non-PK columns does not cause any issues.

Environment details

Steps to reproduce

  1. Create table with a sequence for Bit Reversed ID for PK
  2. Insert records by ActiveRecord::Persistence::ClassMethods#create
  3. Select records with BIT_REVERSE function

sample code:

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "activerecord", "7.0.8.1"
  gem "activerecord-spanner-adapter", "1.6.2"
end

PROJECT_ID  = ENV["GOOGLE_CLOUD_PROJECT"]
INSTANCE_ID = ENV["SPANNER_INSTANCE_ID"]
DATABASE_ID = ENV["SPANNER_DATABASE_ID"]

require "active_record"

puts "Ruby Version: #{RUBY_VERSION}"
puts "ActiveRecordSpannerAdapter Version: #{ActiveRecordSpannerAdapter::VERSION}"
puts "Platform: #{RUBY_PLATFORM}\n\n"

class Singer < ActiveRecord::Base
end

ActiveRecord::Base.establish_connection(
  adapter: :spanner,
  emulator_host: ENV["SPANNER_EMULATOR_HOST"],
  project: PROJECT_ID,
  instance: INSTANCE_ID,
  database: DATABASE_ID
)
ActiveRecord::Base.logger = Logger.new(STDOUT)

class CreateTables < ActiveRecord::Migration::Current
  def change
    reversible do |dir|
      dir.up do
        # sequence for PK
        connection.execute "CREATE SEQUENCE singer_id_seq OPTIONS (sequence_kind = 'bit_reversed_positive', start_with_counter = 101)"
        # sequence for non PK
        connection.execute "CREATE SEQUENCE label_id_seq OPTIONS (sequence_kind = 'bit_reversed_positive', start_with_counter = 1001)"

        create_table :singers, id: false do |t|
          t.primary_key :singer_id, :int64, default: -> { "GET_NEXT_SEQUENCE_VALUE(Sequence singer_id_seq)" }
          t.string :first_name
          t.string :last_name
          t.integer :label_id, default: -> { "GET_NEXT_SEQUENCE_VALUE(Sequence label_id_seq)" }, null: false
          t.binary :singer_info
        end
      end

      dir.down do
        connection.ddl_batch do
          drop_table :singers, if_exists: true
          connection.execute "DROP SEQUENCE IF EXISTS label_id_seq"
          connection.execute "DROP SEQUENCE IF EXISTS singer_id_seq"
        end
      end
    end
  end
end

def reset_table
  CreateTables.migrate(:down)
  CreateTables.migrate(:up)
end

def fetch_singers
  spanner_client ||= Google::Cloud::Spanner.new project: PROJECT_ID
  client = spanner_client.client INSTANCE_ID, DATABASE_ID
  query = "SELECT singer_id, BIT_REVERSE(singer_id, TRUE) AS bit_reversed_singer_id, first_name, last_name, label_id, BIT_REVERSE(label_id, TRUE) AS bit_reversed_label_id from Singers ORDER BY singer_id"
  puts "\n\n== Result =="
  puts query
  puts "singer_id,bit_reversed_singer_id,first_name,last_name,label_id,bit_reversed_label_id"
  client.execute(query).rows.each do |row|
    puts "#{row[:singer_id]},#{row[:bit_reversed_singer_id]},#{row[:first_name]},#{row[:last_name]},#{row[:label_id]},#{row[:bit_reversed_label_id]}"
  end
end

def insert_singers
  records = []

  [
    ['Melissa', 'Garcia'],
    ['Russell', 'Morales'],
    ['Jacqueline', 'Long'],
    ['Dylan', 'Shaw'],
    ['Billie', 'Eillish'],
    ['Judy', 'Garland'],
    ['Taylor', 'Swift'],
    ['Miley', 'Cyrus'],
    ['Michael', 'Jackson'],
    ['Ariana', 'Grande'],
    ['Elvis', 'Presley'],
    ['Kanye', 'West'],
    ['Lady', 'Gaga'],
    ['Nick', 'Jonas']
  ].each do |record|
    records << Singer.create!(first_name: record.first, last_name: record.last)
  end

  puts "#{records.size} records inserted."
end

reset_table
insert_singers
fetch_singers

output:

Fetching gem metadata from https://rubygems.org/...........
Resolving dependencies...
Ruby Version: 3.2.4
ActiveRecordSpannerAdapter Version: 1.6.2
Platform: arm64-darwin23

==  CreateTables: reverting ===================================================
-- drop_table(:singers, {:if_exists=>true})
   -> 0.2043s
==  CreateTables: reverted (26.3537s) =========================================

==  CreateTables: migrating ===================================================
-- create_table(:singers, {:id=>false})
   -> 24.3141s
==  CreateTables: migrated (72.9660s) =========================================

D, [2024-05-14T16:44:05.308835 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:05.891241 #58096] DEBUG -- :   SQL (228.7ms)  COMMIT
D, [2024-05-14T16:44:05.891877 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:05.949751 #58096] DEBUG -- :   SQL (56.2ms)  COMMIT
D, [2024-05-14T16:44:05.950114 #58096] DEBUG -- :   SQL (0.0ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.075361 #58096] DEBUG -- :   SQL (124.3ms)  COMMIT
D, [2024-05-14T16:44:06.075833 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.133158 #58096] DEBUG -- :   SQL (56.2ms)  COMMIT
D, [2024-05-14T16:44:06.133577 #58096] DEBUG -- :   SQL (0.0ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.265820 #58096] DEBUG -- :   SQL (131.1ms)  COMMIT
D, [2024-05-14T16:44:06.266333 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.328566 #58096] DEBUG -- :   SQL (60.6ms)  COMMIT
D, [2024-05-14T16:44:06.329524 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.378995 #58096] DEBUG -- :   SQL (47.9ms)  COMMIT
D, [2024-05-14T16:44:06.379322 #58096] DEBUG -- :   SQL (0.0ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.426864 #58096] DEBUG -- :   SQL (46.1ms)  COMMIT
D, [2024-05-14T16:44:06.427481 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.478707 #58096] DEBUG -- :   SQL (49.3ms)  COMMIT
D, [2024-05-14T16:44:06.479493 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.600205 #58096] DEBUG -- :   SQL (119.3ms)  COMMIT
D, [2024-05-14T16:44:06.600798 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.649446 #58096] DEBUG -- :   SQL (47.1ms)  COMMIT
D, [2024-05-14T16:44:06.649829 #58096] DEBUG -- :   SQL (0.0ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.698496 #58096] DEBUG -- :   SQL (47.5ms)  COMMIT
D, [2024-05-14T16:44:06.699042 #58096] DEBUG -- :   SQL (0.1ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.824879 #58096] DEBUG -- :   SQL (124.7ms)  COMMIT
D, [2024-05-14T16:44:06.825230 #58096] DEBUG -- :   SQL (0.0ms)  BEGIN buffered_mutations
D, [2024-05-14T16:44:06.878760 #58096] DEBUG -- :   SQL (52.5ms)  COMMIT
14 records inserted.

== Result ==
SELECT singer_id, BIT_REVERSE(singer_id, TRUE) AS bit_reversed_singer_id, first_name, last_name, label_id, BIT_REVERSE(label_id, TRUE) AS bit_reversed_label_id from Singers ORDER BY singer_id
singer_id,bit_reversed_singer_id,first_name,last_name,label_id,bit_reversed_label_id
51817707733832161,4883477313770229376,Taylor,Swift,8926134461448323072,1007
639616664526514515,7299157174066676616,Dylan,Shaw,3161526938414088192,1002
860377616647682790,3724318086016960488,Jacqueline,Long,7773212956841476096,1003
1315839402007150447,8898633664378282276,Lady,Gaga,4026218066869223424,1014
1707051106540560840,703346197964424948,Kanye,West,7484982580689764352,1011
1979479884802395067,7991878172753825644,Ariana,Grande,567453553048682496,1008
3401731804619207492,1254199087637681786,Judy,Garland,4314448443020935168,1006
3479716659647133508,1258388089596487942,Melissa,Garcia,5467369947627782144,1001
3564627417272452562,2724838749913222982,Nick,Jonas,5755600323779493888,1017
3732567418445073510,3681935209248233958,Billie,Eillish,6620291452234629120,1005
3922004892146713385,5359845301427493686,Elvis,Presley,1143914305352105984,1016
3967947727262358317,6515769289103869046,Russell,Morales,2008605433807241216,1004
4063774855876586471,8352496075614671630,Miley,Cyrus,2873296562262376448,1010
4229390175907733584,361344513219348142,Michael,Jackson,6332061076082917376,1013

When calculating the sequence generated by singer_id_seq , which is used for the primary key singer_id, with the BIT_REVERSE function, I expect the value to be close to the start_with_counter specified as 101, but the result outputs values such as 7299157174066676616 and 3724318086016960488, which are significantly different from 101.

olavloite commented 3 months ago

@ta1kt0me What is the definition of your singer model? The reason for asking is that the ActiveRecord provider by default uses a client-side generated identifier (which is a random long). My hunch is that this is what is happening in your sample application. One quick way to verify this is to run your test script twice on the emulator. If you see different identifiers for each run, then they are randomly generated.

Using bit-reversed sequences requires you to set a sequence name on the model definition. We had to add this requirement to avoid introducing a breaking change for existing applications that already used randomly generated identifiers. See this example for how to configure that: https://github.com/googleapis/ruby-spanner-activerecord/tree/main/examples/snippets/bit-reversed-sequence#example-models

ta1kt0me commented 3 months ago

@olavloite Thank you for your response.

I confirmed that running it twice generates random values. The schema for singers in the above sample code is as follows.

CREATE TABLE singers (
  singer_id INT64 NOT NULL DEFAULT (GET_NEXT_SEQUENCE_VALUE(Sequence singer_id_seq)),
  first_name STRING(MAX),
  last_name STRING(MAX),
  label_id INT64 NOT NULL DEFAULT (GET_NEXT_SEQUENCE_VALUE(Sequence label_id_seq)),
  singer_info BYTES(MAX),
) PRIMARY KEY(singer_id);

Following your advice, I set the sequence_name and ran the script, and the following error occurred. I am testing it on a Database created on Cloud Spanner, not on an emulator.

model code:

class Singer < ActiveRecord::Base
  self.sequence_name = :singer_id_seq
end

error message:

Mutations cannot be used to create records that use a sequence to generate the primary key. Singer uses singer_id_seq. (ActiveRecord::StatementInvalid)

The same error occurred even with a schema that removes the default value for singer_id from the definition of the singers schema.

CREATE TABLE singers (
  singer_id INT64 NOT NULL,
  first_name STRING(MAX),
  last_name STRING(MAX),
  label_id INT64 NOT NULL DEFAULT (GET_NEXT_SEQUENCE_VALUE(Sequence label_id_seq)),
  singer_info BYTES(MAX),
) PRIMARY KEY(singer_id);

In this situation, is it not possible to make a call like Singer.create ?

olavloite commented 3 months ago

@ta1kt0me Ah, yes, there is one more requirement: You must use at least version 7.1.x of ActiveRecord. The reason for this is that ActiveRecord completely refactored how primary keys work between version 7.0 and 7.1, and we only implemented support for bit-reversed sequences for the newest version. The major change that ActiveRecord introduced between 7.0 and 7.1 is support for composite primary keys.

Implementing it for older versions would mean that we would have to also support it using the third-party composite-primary-key gem that was previously required in order to use composite primary keys with ActiveRecord.

We'll update the sample for bit-reversed sequences to make these requirements clearer. If you have a look at that sample application, then you'll see that is uses Singer.create in combination with a bit-reversed sequence.

ta1kt0me commented 3 months ago

@olavloite Following the updated sample, I confirmed with a simple app using Rails 7.1.3.2 and ruby-spanner-activerecord 1.6.2 that a bit-reversed sequence is generated for PK according to the value of start_with_counter. This was very helpful. I'll update my Rails app to 7.1. Thank you very much.