Open ribose-jeffreylau opened 7 years ago
@DmitryDrobotov would you have time to start with this? Thanks!
@ribose-jeffreylau @ronaldtse I've spent yesterday in my thoughts how to implement so magic behavior and here is some point I realized.
Transcryptor.schema(:user, :ssn)
won't give us an versioning of data changes.@DmitryDrobotov
n
) and the then-previous config (version n - 1
).Thus, any running server instances would use the latest migration version (version n
) as the current config, and also read in the earliest unmigrated version (version n + 1
) if exists.
A background rake task could then run migration version n + 1
. The running server instance's DB would eventually pick up the new config via solution in point 1 (TBD).
db/transcryption/20171010123456_transcrypt_user_ssn.rb
class TranscryptUserSSN < Transcryption
def old
{ algo, salt, etc. }
end
def new
{ algo, salt, etc. }
end
end
and no more yaml
files.
Hence, the configuration is implicitly the "head" of a chain of migration files. The "head" can be different for each row, indicated by a separate column (for each encrypted column) via the transcryption order ID (e.g. 20171010123456
). The labels are usually different only during a transcryption, otherwise they should all point to the same version.
I was actually thinking of a less drastic solution like this, so that the latest configuration is still located in the model, but the old one that we want to get rid of (in time) will be stored in the data migration, and still available if any server instances are still running on old code.
db/transcryption/20171010123456_transcrypt_user_ssn.rb
class TranscryptUserSSN < Transcryption
def old
{ algo, salt, etc. }
end
def new
UserModel.new_ssn_configuration
end
UserModel.register_old_configuration(:ssn, &old)
end
@ribose-jeffreylau @ronaldtse thanks guys! Now I see the picture how to implement that. We will need:
ssn
), then version column will have encrypted_ssn_version:decimal
column and accepts values like 20171010123456
db/transcryption/20171010123456_transcrypt_user_ssn.rb
(we can develop rake task to generate such files)encrypted_ssn_version
column with the latest value.Sounds great to me! What about you?
@DmitryDrobotov this is actually a great solution!
This is actually something more generic than just for encrypted attributes (which may or may not be in a separate gem). This suggests that we have a versioned schema per column cell (i.e., each column cell will have a schema version).
For each schema-ed attribute column we will have an extra column to store the attribute schema version.
The flow works like this:
Code is updated to use the new column schema (20171011000000
) with the new model code, with link to the old column schema (20171010000000
) in the migration file.
Assume that all Rails instances are already updated to the new code. On read of the attribute, depending on the attribute schema, we can load the value according to old or new scheme. On write of the attribute, the value will be migrated to the new scheme.
This way we support "lazy-migrate-on-write" or have a separate background process to migrate attributes to the new column schema with zero-downtime.
What do you think?
@ronaldtse Sounds good, that is the same what I meant except generic way like rails-data-migrations
do. Let's think about it a little bit, do we really need to implement generic solution and add support in transcryptor gem? Or simply implement transcription migrations inside this gem?
My concern was purely about scope -- if the test suite should test only against generic attribute schema it probably should be in/from a separate library.
Do you think these gems are useful for this functionality?
Hi @DmitryDrobotov . What else do you think is needed before we can continue?
@ronaldtse I don't think that these gems can be useful. Because we need zero-downtime migrations of dynamicaly calculated value. For example, if row with id=1 has already been migrated it should use new configuration of attr_encrypted
, but for row with id=1000, which has not migrated yet, we should still use previous configuration for re-encryption. So, I am going to implement something similar, but with some big changes, such as migration version control for separate row (not version of migration, like ActiveRecord migrations work).
Hey there! The only difference between zero downtime approach suggested in master branch and this one is that we have those few extra lines of code in the migrated model, which we want to get rid of now? Do I understand it right?
And what is the difference between this issue and that one: https://github.com/riboseinc/transcryptor/issues/24 ?
CC @ribose-jeffreylau
@nattfodd
The major difference with #24 :
Zero restart is for migrating each data cell (not a table/column) individually. The table and data already exists. The migration happens during production operation of the server and does not need to do any table modifications.
There is no need for post operations on the data schema once the data migration is completed.
Hi @nattfodd !
The mechanism detailed in this issue focuses on data migration. It tries not to touch db/migrations/
at all, as db/migrations/
also has other concerns such as database schema migrations.
Hope it makes things clearer :)
@ribose-jeffreylau sounds clearer, indeed. But I believe it should be extra temporary column anyway, as @DmitryDrobotov suggested right above. It may be done in a migration script though, something like:
# bundle exec rake transcryptor:nodowntime:migrate
add_column :users, :ssn_migrated, :boolean
# do some migrations
remove_column :users, :ssn_migrated
It will affect actual SQL DB schema, but it wont affect db/schema.rb
and db/migrate/*.rb
files...
@nattfodd actually the solution discussed with @DmitryDrobotov is not a temporary column, but a permanent column.
The problem with a temporary column is that the database schema will be in flux -- for example, if multiple servers are reading values but suddenly the column is dropped (especially in many databases a column drop is implemented as a table adjust), it will cause havoc.
A straightforward solution is to have a separate, permanent column to manage "data cell schema", the data of which indicates the "data schema version" of the original cell. This is similar to putting the Rails schema version number in a more granular way, instead of a single version for the entire database, we have versions per value.
The benefits of a permanent column (or columns) are great:
The cost of using this is the implementation of a data layer that can switch (read/write) between data schema versions:
Goal
Implies
attr_encrypted
columns.Potential tools
Instead of running rake tasks to run transcryptions,
rails-data-migrations
seems like a good alternative. It stores its migration history in the tabledata_migrations
.DB
Currently,
db/schema.rb
db/migration/xxx_.rb
app/models/xxx.rb
Column
Some ideas on encryption schema:
db/encryption_schema.yaml
migration:
db/transcryption/2017125_transcrypt_user_ssn.rb
config:
Transcryptor.schema(:user, :ssn)
app/models/user.rb
How to make this work? What else needs to be defined? What's the closest we can acheive?
cc: @DmitryDrobotov