Open TylerHendrickson opened 1 year ago
@TylerHendrickson after reading through the plan here, I had a couple clarification questions on the implementation path. Would you mind weighing in on these?
What's the background on using a combination of fields as primary key instead of a more standard auto-incrementing id
field? If we had a standard id
integer sequence as the primary key, alongside a unique constraint index on the fields we want, it would make things like this easier to migrate. Is this an opportunity to migrate to a sequence id
field as primary key for the table?
If we wanted to maintain the existing primary key approach (based on the other id keys), I'd love to understand the purpose of all the steps you laid out. Maybe as a way to spur the conversation, I'll lay out what I think is a simpler approach to achieve the same outcome, and you can tell me if there are steps or angles I'm missing.
CREATE UNIQUE INDEX CONCURRENTLY idx_grants_viewed_grant_id_agency_id_user_id_unique
ON grants_viewed (grant_id, agency_id, user_id);
ALTER TABLE grants_viewed
DROP CONSTRAINT grants_viewed_pkey,
ADD CONSTRAINT grants_viewed_pkey PRIMARY KEY USING INDEX idx_grants_viewed_grant_id_agency_id_user_id_unique;
This could all be run as a single knex migration, as the first command would run concurrently until complete, then the second command switches the primary key over in a single atomic command. Fwiw, I ran this locally on my dev setup, and it resulted in the table state we're looking for.
I believe all columns that make up a primary key need to be non-null, but user_id is null right now. My understanding from the docs (and from running the above on my devbox, it checks out) is that the ADD CONSTRAINT automatically converts the fields to non-null. This should be the desired behavior, right? (And we'll need to double-check that the field doesn't contain any nulls.)
In addition to the tasks you listed (write the migration, and update the markGrantAsViewed
function) I believe we'll need to also update query logic in places to accommodate the potential for multiple rows returned per grant/agency combo. For example, I think the logic here could return duplicate viewedByAgency records if we don't update it: https://github.com/usdigitalresponse/usdr-gost/blob/d6f0e0e5d1bd8549624467c477b776d412b44328/packages/server/src/db/index.js#L1015-L1020. I'm happy to look through all the queries that touch the grants_viewed
table, but wanted to make sure you concur here.
@jeffsmohan Responses to your questions!
- What's the background on using a combination of fields as primary key instead of a more standard auto-incrementing
id
field? If we had a standardid
integer sequence as the primary key, alongside a unique constraint index on the fields we want, it would make things like this easier to migrate. Is this an opportunity to migrate to a sequenceid
field as primary key for the table?
I'm not really sure about the rationale behind using the compound index on (agency_id, grant_id)
as a PK rather than a separate unique index/constraint; my guess is that this is an arbitrary implementation detail chosen by the author who originally introduced that table.
I don't think I'm grasping what might be enabled or made easier by switching to a serial PK, but I certainly have no opposition to doing so :)
If we wanted to maintain the existing primary key approach (based on the other id keys), I'd love to understand the purpose of all the steps you laid out. Maybe as a way to spur the conversation, I'll lay out what I think is a simpler approach to achieve the same outcome, and you can tell me if there are steps or angles I'm missing.
CREATE UNIQUE INDEX CONCURRENTLY idx_grants_viewed_grant_id_agency_id_user_id_unique ON grants_viewed (grant_id, agency_id, user_id); ALTER TABLE grants_viewed DROP CONSTRAINT grants_viewed_pkey, ADD CONSTRAINT grants_viewed_pkey PRIMARY KEY USING INDEX idx_grants_viewed_grant_id_agency_id_user_id_unique;
This could all be run as a single knex migration, as the first command would run concurrently until complete, then the second command switches the primary key over in a single atomic command. Fwiw, I ran this locally on my dev setup, and it resulted in the table state we're looking for.
Agreed that your implementation is much more straightforward! It's been a minute since I thought about the implementation details, but iirc the series of migrations I laid out were intended to minimize the impacts of acquiring a full-table lock when the transaction is committed. The alter table ... add constraint ... primary key
operation will always incur a full-table lock, but doing it in a transaction that's isolated from building the index ensures that setting the new PK is as close to a config-only change as possible – it doesn't require any scans or index-building at the moment of applying the new PK, since the indexes are already fully up-to-date (and they also ensure that no illegal data was inserted along the way).
I believe that creating the new index and setting the table's PK to that new index in a single transaction will exclusively lock the table while it Postgres scans it to build the new index and ensure no preexisting rows are in conflict. Granted, our traffic patterns are low enough that this probably won't cause a noticeable impact for end-users, so maybe the incremental approach isn't worth the effort (admittedly, this is probably just a force of habit in how I tend to plan out migrations that could be disruptive).
- I believe all columns that make up a primary key need to be non-null, but user_id is null right now. My understanding from the docs (and from running the above on my devbox, it checks out) is that the ADD CONSTRAINT automatically converts the fields to non-null. This should be the desired behavior, right? (And we'll need to double-check that the field doesn't contain any nulls.)
That's a good call-out. I just checked and can confirm that (at least right now) there are no null user_id
values in the grants_viewed
table in either Staging or Production.
- In addition to the tasks you listed (write the migration, and update the
markGrantAsViewed
function) I believe we'll need to also update query logic in places to accommodate the potential for multiple rows returned per grant/agency combo. For example, I think the logic here could return duplicate viewedByAgency records if we don't update it: https://github.com/usdigitalresponse/usdr-gost/blob/d6f0e0e5d1bd8549624467c477b776d412b44328/packages/server/src/db/index.js#L1015-L1020 . I'm happy to look through all the queries that touch thegrants_viewed
table, but wanted to make sure you concur here.
Good catch- agreed!
I don't think I'm grasping what might be enabled or made easier by switching to a serial PK, but I certainly have no opposition to doing so :)
Sounds good, I'm not sure if it's worth migrating at this point, but I wanted to understand as best I could any context before we messed with the primary key here. As for why I've generally seen "every table gets an auto-incrementing id column for PK" as the generic advice (even when the table has a "natural" compound PK in its own data):
Anyway, not necessarily worth correcting in this issue. Like I said, just wanted to make sure I wasn't missing some relevant context.
I believe that creating the new index and setting the table's PK to that new index in a single transaction will exclusively lock the table while it Postgres scans it to build the new index and ensure no preexisting rows are in conflict.
I may have been unclear in my proposal. I would want to run the two psql commands (CREATE UNIQUE INDEX CONCURRENTLY ...
and ALTER TABLE ...
) in two separate transactions. I believe we could even still structure this as a single knex migration (by default, knex migrations are wrapped in a transaction, but there's an option to disable that).
Or am I missing something about that interaction that would cause a longer-lasting full table lock? (Regardless, I do think we're agreed that with the scale of data we're talking about, I don't expect that table lock for the index to last more than a few seconds, but it's good practice to get this right if we can.)
@TylerHendrickson what do you think?
@jeffsmohan TL;DR Consider me on-board with the proposals outlined in https://github.com/usdigitalresponse/usdr-gost/issues/2104#issuecomment-2052349607.
Sounds good, I'm not sure if it's worth migrating at this point, but I wanted to understand as best I could any context before we messed with the primary key here. As for why I've generally seen "every table gets an auto-incrementing id column for PK" as the generic advice (even when the table has a "natural" compound PK in its own data): ...
The reasons you provided seem compelling enough to me that I think any work that involves updating this table's schema definition should probably include migrating to a surrogate PK instead of continuing to rely on the compound PK.
I may have been unclear in my proposal. I would want to run the two psql commands (CREATE UNIQUE INDEX CONCURRENTLY ... and ALTER TABLE ...) in two separate transactions. I believe we could even still structure this as a single knex migration (by default, knex migrations are wrapped in a transaction, but there's an option to disable that).
- The first sql statement creates the index concurrently in the background, without locking the table.
- The second sql statement switches over to the newly created index for the PK in an atomic, locking transaction, but as you say, it should be very quick since the index is already built.
Or am I missing something about that interaction that would cause a longer-lasting full table lock? (Regardless, I do think we're agreed that with the scale of data we're talking about, I don't expect that table lock for the index to last more than a few seconds, but it's good practice to get this right if we can.)
Thanks for clarifying this. As long as the two DDL statements can be run in separate transactions, I think that should be fine – given that the second statement ALTER TABLE ... DROP DROP CONSTRAINT grants_viewed_pkey, ADD CONSTRAINT grants_viewed_pkey PRIMARY KEY ...
can be run atomically, I think that effectively guards against corruption due to data changes in between schema migrations, which is what I was looking to prevent using the "hobbled" migration strategy I outlined originally. If it can be done in one fell swoop (and it seems it can), then I agree that's the way to go!
not sure how to test this one in staging...
@ClaireValdivia Yeah, this one would be tricky to fully test on staging. (You'd have to log in as multiple users from the same organization, then view the same grant, then ensure that didn't trigger an error.) Personally, I think it's sufficient QA testing to:
Why is this issue important?
An error (surfaced in Datadog) exists where requests to mark a grant as "viewed" fail with a 500 status code.
Impacts of this issue:
grants_viewed
table entries associated with that user. While not necessarily problematic on its own, the fact that tracking whether a grant has been viewed is effectively limited to per-agency granularity means that all "viewed" state for an agency is lost for all grants that were first viewed by the deleted user.grants_viewed
table entries for the user's agency are deleted if that user was the first in the agency to view a particular grant.Current State
When a grant is viewed more than once by any user in the same agency, the API request to track that the grant was viewed fails. This happens in either of the following use-cases:
The cause of the issue is the composite primary key constraint on the
grants_viewed
table, which consists of theagency_id, grant_id
combination, coupled with themarkGrantAsViewed()
function's behavior of indiscriminately inserting records without any fallback path when a primary key conflict exists. Furthermore, the primary key for this table fails to account for the possibility that multiple users within the same agency will view a grant, which is a legitimate use-case which will always conflict with the primary key constraint.Expected State
Implementation Plan
grants_viewed
table to include theuser_id
column. In order to do this, the existinggrants_viewed_pkey
index needs to first be dropped, then recreated, with a temporary unique key to preserve existing constraints until the new primary key is in-place. The following example demonstrates these operations in raw SQL; these should be implemented accordingly using Knex (note: each step show below should be executed within a separate transaction in order for this query to be run without requiring downtime):markGrantAsViewed()
function so that when an insert operation encounters a conflict on the composite primary key columns, theupdated_at
value for the existing row is updated instead, e.g.Relevant Code Snippets
No response