spiffe / spire

The SPIFFE Runtime Environment
https://spiffe.io
Apache License 2.0
1.78k stars 472 forks source link

Database primary key reached maximum value of sequence #5501

Open zkonge opened 2 weeks ago

zkonge commented 2 weeks ago

What happened

Our psql raised the "error reached maximum value of sequence" by node_resolver_map_entries_id_seq. And server down.

We have a large amount of agents. and some of the agents we deploy are constantly restarting,

Why

https://github.com/spiffe/spire/blob/3d158ce5635ab92fc5869b4ecda44c6a153b2a06/pkg/server/datastore/sqlstore/models.go#L10 It is seems that the ID field in node_resolver_map_entries is not enough in such scenarios.

SPIRE, which uses gorm v1, will mapping the uint to int type in DB and uint64 to bigint (By the way, gorm v2 will mapping both uint and uint64 to bigint).

Each agent restarting (or attestation) will increase the node_resolver_map_entries_id_seq value.

What to do

Modify the type? Use uint64 in Model primary key.

I'm worried about the DDL modification in the migration, for a large table, the type modification will lock the whole table and the server will down.

azdagron commented 2 weeks ago

I don't think we have a choice other than migrating. We'll have to be careful with our use of AutoMigrate though. We may need to investigate if gorm v1 has a way to tag the ID to use a bigint instead of int (maybe declaring our own model type that defines ID as uint64 or something).

zkonge commented 2 weeks ago

spire has already use its own Model type.

The auto migrate in gorm v1 doesn't modify the field type, so I think maybe changing the field type uint -> uint64 is enough.

In such case, all new created database will use the bigint as primary key and old will still uses the int.

And current uint is 64bit (in 64bit platform), so no overflow happens even downgrading (mostly?). I have no idea how many users use 32 bit platform to host spire-server.