usdigitalresponse / usdr-gost

USDR-hosted grants management tools
http://grants.usdigitalresponse.org
Apache License 2.0
31 stars 21 forks source link

[Bug]: Grant appearing twice in grant finder #2584

Open ClaireValdivia opened 9 months ago

ClaireValdivia commented 9 months ago

Action items

Why is this issue important?

we expect a grant to only appear once for each opportunity ID

Current State

there's a grant that is appearing twice in grantfinder - the id number is the same (DE-FOA-0003164), but the name and open date different. Only one grant appears for this ID in grants.gov

Expected State

we expect each grant opportunity ID to only have one grant in grant finder with the most up to date information (i.e. matching grants.gov)

Implementation Plan

The following functions need to be added...

Relevant Code Snippets

No response

as1729 commented 9 months ago

Looks like they were two individual grants from grants.gov. but they have since deleted one of them: 350952 350904 - this was created first but has since been deleted from grants.gov

Our system does not determine grant uniqueness based on the Opportunity Number (grant_number in the database). But rather uses grants.gov's unique ID as the primary identifier of uniqueness (grant_id in the database). I believe this concept is still correct and we should not be changing it as seen from this scenario.

Option 1 - keep grant_id as the primary key and delete/archive the old grant

The issue is therefore our lack of deleting grants from grants.gov when they are removed. I believe we should create a separate issue to implement that feature and go through a product exercise to understand the intended behavior in various scenarios. For example:

Option 2 - change the primary key to be grant_number and update the grant

This option may result in some loss of data, but may actually be the correct solution here. This will ensure that even if a grant was assigned/marked-interested there won't be any issue since the grant_number stays the same. The only thing that changes is the grant_id in the backend and the subsequent URLs generated to grants.gov. If the grant details have changed then the partners may not have a way to get back to what the grant previously used to look like which is the case today for other changes to the grant.

@ClaireValdivia for all the reasons above I think we can mark this issue as closed since there is not much we can do on the engineering side here without answering the questions and implementing the ability to delete.

TylerHendrickson commented 9 months ago

@ClaireValdivia re

check if we have other duplicate opportunity IDs in grant finder

I checked the grants table for duplicate grant_number (aka Opportunity ID) values...

(Note: I spot-checked the discrepancy here and am fairly confident we can just chalk it up to Staging having some junk data because it's Staging).

Attached you can find the results (from each environment) of the following query that I used to detect duplicates and also provide some additional insight that might be useful for spot-checking:

select
    grant_number,
    count(grant_number) as "occurrences",
    array_to_string(array_agg(grant_id order by grant_id::int), ',') as "grant_id values sharing grant_number",
    max(grant_id::int) as "latest grant_id with this grant_number",
    max(updated_at) as 
"latest updated_at for all duplicates",
    max(created_at) as "latest created_at for all duplicates"
from grants
group by grant_number
having count(grant_number) > 1
order by "occurrences" desc, grant_number asc;

results_production.txt results_staging.txt

TylerHendrickson commented 9 months ago

Regarding next steps, I'd like to check whether Grants.gov ever updates an older grant opportunity after a new record is published with a duplicate opportunity number. That is, my assumption is:

  1. Grant 123 is published with Opportunity Number ABC246
  2. Grant 123 is updated (maybe the description is changed or something) and Opportunity Number remains the ABC246
  3. Grant 456 is published with Opportunity Number ABC246
  4. (Grant 123 is deleted from Grants.gov and will never be updated again)

but I think we should check for sure that the assumption in the final step is actually the case. If, in the above example, Grant 123 were to receive a new update after Grant 456 was published, then we'll probably want to consider how we should deal with a "bifurcated grant opportunity" scenario.

Similarly, I also think it would be wise to validate the assumption of: "whenever a new grant opportunity is published with a duplicate Opportunity Number, the old grant opportunity record is removed entirely from Grants.gov" – out of all the duplicate grant_number/Opportunity Numbers that we found, are we certain that there is only 1 working link for each of them?

If we can confirm that both of the above assumptions hold true, then I think that would strongly support going with Option 2 - change the primary key to be grant_number and update the grant proposed in https://github.com/usdigitalresponse/usdr-gost/issues/2584#issuecomment-1934335537.

ClaireValdivia commented 9 months ago

@TylerHendrickson wondering if we start by asking the simplergrants team... do you have any open threads of questions to them where you could ask them about this?