ucd-library / aggie-experts

Publicly reported feedback and issues for Aggie Experts
https://ucd-library.github.io/aggie-experts/
MIT License
1 stars 2 forks source link

Resolve Grant Feed Anomalies #501

Closed rakunkel-ucd closed 2 months ago

rakunkel-ucd commented 3 months ago

Several types of anomalies have been observed in the Grant feed process.

  1. Some grants seem to lack user links.
  2. Some Aggie Enterprise grant titles indicate the record has been deprecated, replaced, or has other issues.
  3. At least one (and presumably more) grant appears to be replicated in Dimensions and so a UCD Grant(pre 2024) grant type.
rakunkel-ucd commented 3 months ago

Dimensions grant via the API: https://oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v5.5/grants/4522184?detail=full

rakunkel-ucd commented 3 months ago

Was this grant https://oapolicy.universityofcalifornia.edu/viewobject.html?id=4542319&cid=2 previously uploaded to the UCPMS as a "ucd grant (pre-2024)" item type?

That was imported, but as a “UCD Grant” type, which is the type we use for current grant imports. The other grant apparently came from Dimensions, but shows “UCD grant (pre-2024)” as its type which is what we use for our set of “archived” grants.

Roger Kunkel UC Davis Library Online Strategy

Ticket ID: [#138662]

Hello Roger Kunkel,

Thanks for your message. I'm looping in Symplectic Support as they may need to assist with the investigation.

My understanding is that this grant https://oapolicy.universityofcalifornia.edu/viewobject.html?id=4522184&cid=2 is from Dimensions (Grants). My expectation is that Dimensions Grants would always import as the default item type "grant", and that they may group with existing "ucd grant (pre-2024)" item types if there is relevant matching metadata.

Was this grant https://oapolicy.universityofcalifornia.edu/viewobject.html?id=4542319&cid=2 previously uploaded to the UCPMS as a "ucd grant (pre-2024)" item type?

Warm regards,

Alainna Wrigley On Wed, May 22 at 3:51 PM , Roger Kunkel [rakunkel@ucdavis.edu](mailto:rakunkel@ucdavis.edu) wrote: Hi,

We have a mysterious problem with Grants appearing in our set of grants of the custom type UCD Grant (PRE-2024)

This is the Dimensions grant that masquerades as our pre-2024 grant: https://oapolicy.universityofcalifornia.edu/viewobject.html?id=4522184&cid=2.

And this is our actual record: https://oapolicy.universityofcalifornia.edu/viewobject.html?id=4542319&cid=2

We aren’t expecting to see grants sourced from Dimensions and especially not with our custom type. Can you investigate this for us?

Thanks!

Roger Kunkel UC Davis Library Online Strategy

rakunkel-ucd commented 3 months ago

Example of a grant the fails to upload because no user link is found: ark:/87287/d7c08j/grant/K385F21

rakunkel-ucd commented 3 months ago

Or perhaps fails to update.

rakunkel-ucd commented 3 months ago

A third version of this grant? https://oapolicy.universityofcalifornia.edu/viewobject.html?cid=2&id=4518593

rakunkel-ucd commented 3 months ago

A third version of this grant? https://oapolicy.universityofcalifornia.edu/viewobject.html?cid=2&id=4518593

That one is from 2016 though.

qjhart commented 2 months ago

@rakunkel-ucd reading Lucian's notes, I'm still not exactly sure what the dimensions sync tool does, whether it simply joins grants that were already in the grant Grant type, or if it fetches more from somewhere else.

However, with this query:

 with grc as (
    select gr.[Grant ID],count(*) as record_cnt 
    from [Grant Record] gr 
    group by gr.[Grant ID]
    )
select g.[Records Imported From],gt.[Identifier name],grc.record_cnt, COUNT(*) as cnt
from  [Grant] g 
join [Grant Type] gt on gt.[ID] = g.[Type ID] 
JOIN grc on grc.[Grant ID]=g.ID
WHERE gt.[Identifier name] like 'c-ucd%'
group by g.[Records Imported From],gt.[Identifier name],grc.record_cnt
order by [Identifier name],record_cnt
Records Imported From Identifier name record_cnt cnt
Dimensions Grants c-ucdavis-pre-ae 1 116
Institutional Grants System c-ucdavis-pre-ae 1 33544
Dimensions Grants, Institutional Grants System c-ucdavis-pre-ae 2 1480
Dimensions Grants, Dimensions Grants, Institutional Grants System c-ucdavis-pre-ae 3 4
Institutional Grants System c-ucd-enterprise 1 7369
Dimensions Grants, Institutional Grants System c-ucd-enterprise 2 382

We can see there are about 116 orphan grant records from dimensions in c-ucdavis-pre-ae

So my main question is whether we can simply remove these our our own delete tool, using the Dimensions Data Source Proprietary ID as in:

with grc as (
    select gr.[Grant ID],count(*) as record_cnt 
    from [Grant Record] gr 
    group by gr.[Grant ID]
    )
select gr.[Grant ID],gr.Id,gr.[Data Source Proprietary ID],gr.[Data Source]
from  [Grant] g 
join [Grant Type] gt on gt.[ID] = g.[Type ID] 
JOIN grc on grc.[Grant ID]=g.ID 
JOIN [Grant Record] gr on g.ID=gr.[Grant ID]
WHERE gt.[Identifier name] in ('c-ucdavis-pre-ae','c-ucd-enterprise')
and record_cnt=1 and gr.[Data Source] != 'Institutional Grants System'
Grant ID Id Data Source Proprietary ID Data Source
2794835 7690964 grant.2438870 Dimensions Grants
2795743 7693308 grant.2438688 Dimensions Grants
2797232 7697975 grant.7910180 Dimensions Grants
2798567 7702007 grant.2439943 Dimensions Grants
2798678 7702321 grant.6952810 Dimensions Grants
2799345 7704610 grant.6377536 Dimensions Grants
2799413 7704775 grant.5476233 Dimensions Grants
2799416 7704781 grant.4455513 Dimensions Grants
2799458 7704864 grant.3802089 Dimensions Grants
2799488 7705006 grant.5504306 Dimensions Grants
qjhart commented 2 months ago

@rakunkel-ucd However, I'll note these don't look like they synced with the new records in c-ucd-enterprise

with grc as (
    select gr.[Grant ID],count(*) as record_cnt 
    from [Grant Record] gr 
    group by gr.[Grant ID]
    ),
orphans as (
select gr.[Data Source Proprietary ID]
from  [Grant] g 
join [Grant Type] gt on gt.[ID] = g.[Type ID] 
JOIN grc on grc.[Grant ID]=g.ID 
JOIN [Grant Record] gr on g.ID=gr.[Grant ID]
WHERE gt.[Identifier name] in ('c-ucdavis-pre-ae','c-ucd-enterprise')
and record_cnt=1 and gr.[Data Source] != 'Institutional Grants System'
)
select orphans.[Data Source Proprietary ID],count(*) as cnt 
from orphans join [Grant Record] gr on orphans.[Data Source Proprietary ID]=gr.[Data Source Proprietary ID]
group by orphans.[Data Source Proprietary ID]
having count(*) > 1
order by cnt desc;
qjhart commented 2 months ago

Also we can verify no grants have wander out of our two identifiers

with ucd_records as (
    select distinct gr.[Grant ID] 
    from [Grant Record] gr 
    where gr.[Data Source Proprietary ID] like 'ark:/87287/%'
    ),
grc as (
    select gr.[Grant ID],count(*) as record_cnt 
    from [Grant Record] gr 
    group by gr.[Grant ID]
    )
select g.[Records Imported From],gt.[Identifier name],grc.record_cnt, COUNT(*) as cnt
from ucd_records join [Grant] g on ucd_records.[Grant ID] = g.ID
join [Grant Type] gt on gt.[ID] = g.[Type ID] 
JOIN grc on grc.[Grant ID]=g.ID
group by g.[Records Imported From],gt.[Identifier name],grc.record_cnt
order by [Identifier name],record_cnt
Records Imported From Identifier name record_cnt cnt
Institutional Grants System c-ucdavis-pre-ae 1 33544
Dimensions Grants, Institutional Grants System c-ucdavis-pre-ae 2 1480
Dimensions Grants, Dimensions Grants, Institutional Grants System c-ucdavis-pre-ae 3 4
Institutional Grants System c-ucd-enterprise 1 7369
Dimensions Grants, Institutional Grants System c-ucd-enterprise 2 382
rakunkel-ucd commented 2 months ago

And another question would be how do we stop them from coming back?

qjhart commented 2 months ago

@rakunkel-ucd presumably, these are all from grants that have been moved, so I'm okay with us periodically having to check, as longs as we can easily get the list as above, and we can delete them.

rakunkel-ucd commented 2 months ago

June 18 email:

Subject: Re: Re: Dimensions Grants replicating UCD Grants - Ticket ID: [#138662] [480444]

Hello Roger,

About this "I believe we have about 700 records in this state, so I assume we’ll want to design a bulk update to correct that" from your last message, how do you intent to do it? You can utilize the API to bulk update the type of the problematic objects. Alternatively, you can give us with a list of these grants and we will update them. This is chargeable work according to our SLA and we will need approximately 1 support hour. ​ Kind regards, Lucian B.

Ticket: https://support.symplectic.co.uk/support/tickets/480444 On Fri, 14 Jun at 7:51 AM , Lucian Bitiusca [support@symplectic.co.uk](mailto:support@symplectic.co.uk) wrote: Hi Roger,

When you are saying to opt-out the Dimensions synchronization are you referring to a list of users to be excluded from Dimensions grants search or do you intent to exclude all users?

Kind regards,

Ticket: https://support.symplectic.co.uk/support/tickets/480444 On Thu, 13 Jun at 5:12 PM , Roger Kunkel [rakunkel@ucdavis.edu](mailto:rakunkel@ucdavis.edu) wrote: Thank you for this analysis, Lucian. Our unusual workflow with grants moving from one source to another combined with the Dimensions synchronizer results in some unexpected problems. How can we learn more about the Dimensions synchronization process and what options are available to affect it? Can we opt-out of it? I believe we have about 700 records in this state, so I assume we’ll want to design a bulk update to correct that.

Roger Kunkel UC Davis Library Online Strategy

From: Lucian Bitiusca [support@symplectic.co.uk](mailto:support@symplectic.co.uk) Date: Thursday, June 13, 2024 at 12:56 AM To: oapolicy-help@universityofcalifornia.edu [oapolicy-help@universityofcalifornia.edu](mailto:oapolicy-help@universityofcalifornia.edu) Cc: Roger Kunkel [rakunkel@ucdavis.edu](mailto:rakunkel@ucdavis.edu), alainna.wrigley@ucop.edu [alainna.wrigley@ucop.edu](mailto:alainna.wrigley@ucop.edu), oapolicy-help@universityofcalifornia.edu [oapolicy-help@universityofcalifornia.edu](mailto:oapolicy-help@universityofcalifornia.edu) Subject: Re: Re: Dimensions Grants replicating UCD Grants - Ticket ID: [#138662] [480444] Hi Alainna, Roger,

Below you can see the steps which explains why this object with a Dimensions record has the "UCD Grant (pre-2024)" type.

  1. On 22-Mar at 14:48:40 (FTP server is 3 hours ahead) this record was imported: "ark:/87287/d7gt0q/grant/K388D41-131124" using this data found in the backup data file: ark:/87287/d7gt0q/grant/K388D41-131124,grant,c-ucdavis-pre-ae,"SC-Links - Linking the Sickle Cell Community, Providers and Systems to Improve Sickle Cell Disease Care","WUN, THEODORE","NATIONAL HEART, LUNG AND BLOOD INSTITUTE",U01HL134007,2021-08-15,2023-06-30,108997,USD,Research,http://rems.ucop.edu/sponsor/3435,,TRUE

You can see the log file on the FTP server in the "\PROD\Logs\UCDavis_Grants_Feedlogs\2024-03-22(13_04)" folder: 2024-03-22 17:48:40: Import(39bbee85-621d-4a66-971e-8dfcc503e760) :: PUT Request made to [https://oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v5.5/grant/records/source-3/ark%3A%2F87287%2Fd7gt0q%2Fgrant%2FK388D41-131124?update-reporting-dates=true&update-object-type=true] to import [grant(source-source-3,pid-ark:/87287/d7gt0q/grant/K388D41-131124)] :: content size = 1018 2024-03-22 17:48:40: Import(39bbee85-621d-4a66-971e-8dfcc503e760) :: 201(Created) response received

This created the object in Elements "https://oapolicy.universityofcalifornia.edu/viewobject.html?id=4522184&cid=2" and the object type was set at this moment: "UCD Grant (pre-2024)" based on the input data from the file. Image removed by sender.

  1. On 23-Mar the Dimensions grant search returned one result and it was injected in our existing object. This actions won't change the object type. The object type is set when the object is created and it can be changed only manually after or using the API. 2024-03-23 22:05:53 [Dimensions Grants synchroniser]: Processing priority 15 object search for Grant 4522184... 2024-03-23 22:05:55 [Dimensions Grants synchroniser]: Executing query against Dimensions Grants: extract_grants (funder_name="NATIONAL HEART, LUNG AND BLOOD INSTITUTE", grant_number="U01HL134007") 2024-03-23 22:05:55 [Dimensions Grants synchroniser]: Search returned 1 result. 2024-03-23 22:05:55 [Dimensions Grants synchroniser]: Result Dimensions Grants grant.5503426 record awaiting retrieval and injection into Grant 4522184. 2024-03-23 22:05:58 [Dimensions Grants synchroniser]: Processing priority 20 fetch for a batch of 1 records... 2024-03-23 22:06:00 [Dimensions Grants synchroniser]: Executing query against Dimensions Grants: search grants where id in ["grant.5503426"] return grants[abstract+category_for_2020+category_hrcs_hc+category_hrcs_rac+category_rcdc+category_sdg+date_inserted+end_date+funding_aud+funding_cad+funding_chf+funding_currency+funding_eur+funding_gbp+funding_jpy+funding_nzd+funding_org_name+funding_usd+grant_number+id+investigators+linkout+research_orgs+start_date+title] sort by id limit 25 skip 0 2024-03-23 22:06:07 [Dimensions Grants synchroniser]: Grant record 14603321 (Dimensions Grants grant.5503426): Injected into Grant 4522184.

  2. On 04-Apr the imported record was deleted and the object remained with a single record from Dimensons. 4/4/2024 2:06:33 PM,DELETE,https://oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v5.5/grant/records/source-3/ark:%2F87287%2Fd7gt0q%2Fgrant%2FK388D41-131124,204,NoContent,147

There are two scenarios that can lead to this result: the one listed above where the imported record is created first, and then the Dimensions record is merged, followed by the deletion of the imported record the Dimensions record comes first creating an object of type "grant", then a record is imported and merged into existing object changing the type to "UCD Grant (pre-2024)", followed by the deletion of the imported record You can identify similar cases where there is an object with a single Dimensions record where the type is not "grant" using this SQL query: select g.[Records Imported From], g.[ID] [Grant ID], g.[Date Created In Elements] [Grant Created In Elements], gr.[Created When] [Dimensions record Created In Elements], g.* from [Grant] g join [Grant Type] gt on gt.[ID] = g.[Type ID] join [Grant Record] gr on g.[ID] = gr.[Grant ID] where gt.[Identifier name] != 'grant' and g.[Records Imported From] = 'Dimensions Grants' I hope this will help!

Please, let us know if you need further assistance or we can close this ticket. Thank you!

Kind regards, Lucian B.

Ticket: https://support.symplectic.co.uk/support/tickets/480444 On Thu, 13 Jun at 12:44 AM , Roger Kunkel [rakunkel@ucdavis.edu](mailto:rakunkel@ucdavis.edu) wrote: I see this grant which we imported from the UCD KFS Financial system. Grants from KFS (the old accounting system) we import with UCD Grant(pre 2024) type. However, these older grants sometimes find their way to the new UCD accounting system and appear in a weekly feed from there. When that happens we delete the old version of type UCD Grant(pre 2024) and import the the new version as a UCD Grant type grant.

So, I believe this grant first came to us via KFS as: 131124,"tedwun","WUN, THEODORE","SC-Links - Linking the Sickle Cell Community, Providers and Systems to Improve Sickle Cell Disease Care",108997,108997,0,"3435","","668982","RESEARCH","U01HL134007","2021-08-15","2023-06-30","S","049227","049014","K388D41","MED:Int Med HEM/ONC","MED:Internal Medicine"

And then from Aggie Enterprise as: ark:/87287/d7c08j/grant/K388D41,grant,c-ucd-enterprise,SC Links Linking the Sickle Cell Community Providers and Systems to Improve Sickle Cell Disease Care K388D41,Theodore Wun,REGENTS OF THE UNIVERSITY OF CALIFORNIA SAN FRANCISCO,13289sc,2021-08-15,2023-06-30,108997,USD,Research,http://rems.ucop.edu/sponsor/6762,,TRUE

But the funder reference has changed.

Roger Kunkel UC Davis Library Online Strategy

480444:261055 480444:261055

rakunkel-ucd commented 2 months ago

Hello Roger, Checking if the Dimensions records were deleted based on your last "Prod_UCD_grants_to_delete.csv" file, we saw that the records were not deleted because the tool was configured to delete by default grants with source "Institutional Grants System" (source-3) and the grants to delete from the input file are with source Dimensions. We configured the tool to use from the input file two columns: a "data-source" column for the source of the grant record and the "id" column as before

Starting from now the input file needs to have two columns: a "data-source" column and an "id" column When you will need to delete grant records with "Institutional Grants System" source , the "data-source" column can be empty. The input file will look like this: data-source,id ,ark:/87287/d7gt0q/grant/K320961-1248 When you will need to delete grant records with Dimensions source, the value in the "data-source" column must be "dimensions-grants". The input file will look like this: data-source,id dimensions-grants,grant.8382802

We updated the tool to use the new configuration on both systems (QA and PROD). Please, test it also in your side and let us know your comments.

Kind regards, Lucian B.

rakunkel-ucd commented 2 months ago

Using the format above to successfully delete grants now. data-source,id dimensions-grants,grant.8382802