Closed shelleydoljack closed 7 months ago
@ahafele for the first instance HRID in the Book2.csv file shared in slack, I did this:
external_hrid = 'in00000047302'
updated_date DESC
Now I'm not sure which record version to take. There are __current
and state
columns. Interestingly, there are many source records with ACTUAL state. If I take the value of the id
column for one of them and query okapi I get some data: /source-storage/records/34790208-7c5a-49b0-8c5c-9af6e721d92d
.
If I adjust the metadb query to external_hrid = 'in00000047302' and "__current"
, all of them are state = OLD
. Changing the query again to external_hrid = 'in00000047302' and state = 'ACTUAL'
and take id of the latest updated date and query okapi: /source-storage/records/0ba61f07-33ff-4315-b10e-d5857c439459
I get a record with state = OLD. I cannot seem to query that endpoint by externalIdsHolder.InstanceId==
but I can do it by snapshotId (which we can get from metadb). snapshotId 3d586f6f-ad86-462c-8037-dbc8f257487d
(the latest updated date in metadb table) returns 2 OLD SRS records. So anyways, I'm just back to "not sure which SRS records is the latest". And this is just for the first in that spreadsheet.
I could try to just use what metadb says is the latest updated record and somehow connect it to the instance record and see if that works for EAL. If so, we could just use that as a general rule to fix these. :shrug:
I think that approach makes sense. Based on what you said these records don't fall into the records described here - https://folio-org.atlassian.net/wiki/x/uj8V Wondering if it is possible there are other SRS out there that have the instance hrid in the 999$i and not the external_hrid ?
I'm not sure. At least for this first one, instance Id b0dafd3b-5e31-47bc-8d29-591520331587
is not found when doing a GET to /source-storage/source-records?instanceId=b0dafd3b-5e31-47bc-8d29-591520331587
or a GET to /source-storage/records/b0dafd3b-5e31-47bc-8d29-591520331587/formatted?idType=INSTANCE
but it is in the field:
"externalIdsHolder":{
"instanceId": "b0dafd3b-5e31-47bc-8d29-591520331587",
"instanceHrid": "in00000047302"
}
when you do a GET to /source-storage/records/{sourceRecordId gotten from records_lb table in metadb}
.
I logged into the db and did a query on that table: select * from sul_mod_source_record_storage.records_lb where external_id = 'b0dafd3b-5e31-47bc-8d29-591520331587' order by updated_date DESC;
I get 7 rows as a result.
id | snapshot_id | matched_id | generation | record_type | external_id | state | leader_record_status | order | suppress_discovery | created_by_user_id | created_date | updated_by_user_id | updated_date | external_hrid
--------------------------------------+--------------------------------------+--------------------------------------+------------+-------------+--------------------------------------+-------+----------------------+-------+--------------------+--------------------------------------+----------------------------+--------------------------------------+-------------------------------+---------------
34790208-7c5a-49b0-8c5c-9af6e721d92d | 3d586f6f-ad86-462c-8037-dbc8f257487d | 02b500bc-8395-42df-967a-3a870a1b7215 | 2 | MARC_BIB | b0dafd3b-5e31-47bc-8d29-591520331587 | OLD | c | 0 | f | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-16 18:29:21.124-08 | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-21 06:40:30.009146-08 | in00000047302
5b554e74-9db9-4cbe-90e3-e4811c5f253a | ad50bbe6-86ff-48d9-b2bc-323da9aeadd4 | 02b500bc-8395-42df-967a-3a870a1b7215 | 3 | MARC_BIB | b0dafd3b-5e31-47bc-8d29-591520331587 | OLD | c | 0 | f | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-16 18:31:04.931-08 | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-21 06:40:30.00888-08 | in00000047302
0ba61f07-33ff-4315-b10e-d5857c439459 | 3d586f6f-ad86-462c-8037-dbc8f257487d | 02b500bc-8395-42df-967a-3a870a1b7215 | 6 | MARC_BIB | b0dafd3b-5e31-47bc-8d29-591520331587 | OLD | c | 0 | f | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-21 06:25:34.806-08 | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-21 06:40:30.007377-08 | in00000047302
70c1bdb8-93df-40f0-81df-2246fc32afa4 | 691b5adc-6196-4aa6-bdfb-024e9bca16c4 | 02b500bc-8395-42df-967a-3a870a1b7215 | 1 | MARC_BIB | b0dafd3b-5e31-47bc-8d29-591520331587 | OLD | c | 0 | f | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-16 18:28:58.241-08 | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-21 06:40:30.006096-08 | in00000047302
36f40463-2cbb-4603-9c0a-ba37225cc926 | 96a0d6f4-77e4-40d1-9ad8-74e7034379a7 | 02b500bc-8395-42df-967a-3a870a1b7215 | 4 | MARC_BIB | b0dafd3b-5e31-47bc-8d29-591520331587 | OLD | c | 0 | f | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-17 11:24:12.515-08 | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-21 06:40:30.00609-08 | in00000047302
9f5e65f0-3770-428a-878b-bca2a435af4e | 860012f1-40d8-4726-b9b3-6fe1e033eb4c | 02b500bc-8395-42df-967a-3a870a1b7215 | 5 | MARC_BIB | b0dafd3b-5e31-47bc-8d29-591520331587 | OLD | c | 0 | f | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-17 12:38:32.597-08 | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-21 06:40:30.005938-08 | in00000047302
02b500bc-8395-42df-967a-3a870a1b7215 | 8c2f3757-48cf-470a-b236-3160cea2aeaf | 02b500bc-8395-42df-967a-3a870a1b7215 | 0 | MARC_BIB | b0dafd3b-5e31-47bc-8d29-591520331587 | OLD | c | 0 | f | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-16 18:25:21.527-08 | 3dc624cc-3bf1-4495-9964-41b0d59d5c4b | 2024-01-21 06:40:12.006189-08 | in00000047302
(I'm not sure how to read the timestamps for the order by; it doesn't look right.)
Also, I'm not sure the state field means anything. For instance, just looking at some source records from raw_records_lb table and then seeing what the endpoint /source-storage/records
shows for them, a lot have state "OLD" but AFAICT it is the current source record for a record (it's in the 999 $f for it in inventory view). :shrug:
I'm trying to POST the supposed latest SRS for hrid in00000047302
and I keep getting an error about bad request. @jermnelson is also looking into it and we haven't figured it out yet. @ahafele the json I'm posting is in this file in00000047302_json.txt
Is it okay for our records to have a 100 with subfield 1? It is a supposedly valid field but I wonder in the folio context...?
100 $1 should be fine in FOLIO and I'm not seeing any discussion of there being a current problem with this subfield. I'm adding relevant slack convos to https://docs.google.com/document/d/1_O1_Hx44q7o9Qmm_HvF7DNugAcDzUMZ8GYIEuT2v0zI/edit. I haven't found anything concrete but some discussion of similar issues. Should we ask the community or Index Data about this?
I'm going to see if I can use the change-manager endpoint to update the record.
I tried a PUT to /change-manager/parsedRecords/02b500bc-8395-42df-967a-3a870a1b7215
and it returned 202 accepted but looking at the logs for mod-srs:
2024-02-09T11:42:27-07:00 18:42:27.457 [vert.x-worker-thread-13] [] ERROR uickMarcKafkaHandler Failed to handle QM_RECORD_UPDATED event
2024-02-09T11:42:27-07:00 javax.ws.rs.NotFoundException: Record with id '34790208-7c5a-49b0-8c5c-9af6e721d92d' was not found
This is the json I tried PUTting: in00000047302_parsedRecord_json.txt
But if I do select * from sul_mod_source_record_storage.marc_records_lb where id = '34790208-7c5a-49b0-8c5c-9af6e721d92d';
I get the record. Also, if I do a select * from sul_mod_source_record_storage.raw_records_lb where id = '34790208-7c5a-49b0-8c5c-9af6e721d92d';
I get the record. I guess I'll just start a thread about it in folio slack about what to do next.
What is also kind of curious is that records_lb has numbers 0-6 in the generation column for this record but the instance shows "_version" : "8"
. Should the records_lb have a row with generation number 7?
Ahh, ☝️ it's probably unrelated. I looked at a "normal" record and the instance's version is 4 but the "latest" generation in the source record is 1.
I tried running this query (from the wiki page) and got 0 rows, so it's not the case (at least for this one example) that there is an external_id
column data missing.
SELECT records_lb.id AS record_id, records_lb.external_id, record_instance_ids.instance_id
FROM sul_mod_source_record_storage.records_lb
JOIN (
SELECT id, ids_subfields ->> 'i' AS instance_id
FROM (
SELECT id, fields
FROM sul_mod_source_record_storage.marc_records_lb, jsonb_array_elements(content -> 'fields') as fields
WHERE fields ? '999'
) AS ids_fields, jsonb_array_elements(fields -> '999' -> 'subfields') AS ids_subfields
WHERE ids_subfields ? 'i'
) AS record_instance_ids ON records_lb.id = record_instance_ids.id
WHERE external_id IS NULL;
I started a Slack thread in FOLIO slack. https://folio-project.slack.com/archives/C2BHMHCFJ/p1707505894186479
I modified the query from the wiki page and it looks like there are 7 SRS with instance ID's that match external_id from the records_lb table:
SELECT records_lb.id AS record_id, records_lb.external_id, record_instance_ids.instance_id
FROM sul_mod_source_record_storage.records_lb
JOIN (
SELECT id, ids_subfields ->> 'i' AS instance_id
FROM (
SELECT id, fields
FROM sul_mod_source_record_storage.marc_records_lb, jsonb_array_elements(content -> 'fields') as fields
WHERE fields ? '999'
) AS ids_fields, jsonb_array_elements(fields -> '999' -> 'subfields') AS ids_subfields
WHERE ids_subfields ? 'i'
) AS record_instance_ids ON records_lb.id = record_instance_ids.id
WHERE external_id = 'b0dafd3b-5e31-47bc-8d29-591520331587';
record_id | external_id | instance_id
--------------------------------------+--------------------------------------+--------------------------------------
36f40463-2cbb-4603-9c0a-ba37225cc926 | b0dafd3b-5e31-47bc-8d29-591520331587 | b0dafd3b-5e31-47bc-8d29-591520331587
02b500bc-8395-42df-967a-3a870a1b7215 | b0dafd3b-5e31-47bc-8d29-591520331587 | b0dafd3b-5e31-47bc-8d29-591520331587
70c1bdb8-93df-40f0-81df-2246fc32afa4 | b0dafd3b-5e31-47bc-8d29-591520331587 | b0dafd3b-5e31-47bc-8d29-591520331587
34790208-7c5a-49b0-8c5c-9af6e721d92d | b0dafd3b-5e31-47bc-8d29-591520331587 | b0dafd3b-5e31-47bc-8d29-591520331587
5b554e74-9db9-4cbe-90e3-e4811c5f253a | b0dafd3b-5e31-47bc-8d29-591520331587 | b0dafd3b-5e31-47bc-8d29-591520331587
9f5e65f0-3770-428a-878b-bca2a435af4e | b0dafd3b-5e31-47bc-8d29-591520331587 | b0dafd3b-5e31-47bc-8d29-591520331587
0ba61f07-33ff-4315-b10e-d5857c439459 | b0dafd3b-5e31-47bc-8d29-591520331587 | b0dafd3b-5e31-47bc-8d29-591520331587
It seems the problem is that none of the rows in records_lb has a state=ACTUAL.
Updating state to "ACTUAL" for the highest generation number in records_lb seemed to fix the problem. I will try updating the rest from the Book2.csv file shared in Slack. What other records need fixing?
update sul_mod_source_record_storage.records_lb
set state = 'ACTUAL'
where id = '0ba61f07-33ff-4315-b10e-d5857c439459';
hrid's in00000047503 and in00000047623 already have state = ACTUAL and it seems they're not broken in FOLIO.
select id, generation, state, external_hrid
from sul_mod_source_record_storage.records_lb
where external_hrid in ('in00000047658', 'in00000047663', 'in00000047666', 'in00000047872', 'in00000047952', 'in00000048015', 'in00000048022', 'in00000048030', 'in00000048039', 'in00000048159', 'in00000048167', 'in00000048595', 'in00000048442', 'in00000048422', 'in00000048623', 'in00000048635', 'in00000048641', 'in00000048737', 'in00000048751', 'in00000048755', 'in00000048756', 'in00000048763', 'in00000048767');
Fixed the 23 records from Book2.csv:
update sul_mod_source_record_storage.records_lb
set state = 'ACTUAL'
where id in ('5ff8f78f-a2fe-4b69-a35a-8d8715fc7dd0', 'd64c8dd8-78af-40ac-a0ec-e9771594a260', '059fe2fb-e987-4975-a34d-782bdfb9cd46', '7c8ef0ba-4d6a-4e76-900f-de76036f5302', 'edfe4c36-dcf2-42f5-995a-8a3b69a5248c', '398c9d9e-37f2-49df-90d2-9ccf356568f5', '74f85a19-b213-4c12-9e46-8438031f8313', '04b14171-0aa7-4f23-9377-4b8d6cf73921', '4a61681b-becc-41b8-940c-93f1d126a9dc', 'b3047c83-fb44-47ed-99e3-950f257d132a', '5966b64b-4951-4e5d-bff7-e6d798b49b0a', '2f1dd48b-9004-43e6-9f55-bff39a6753e5', '059d9d8b-4117-409b-8d71-11d052143ed0', '464f8372-23bb-4b70-a39c-9c3cc73f2858', '59348611-209f-4d20-8954-86cdf35e65be', '5e423f61-2fd0-414d-9fd0-94420c95e06a', 'f5246c6f-0b46-420b-8cae-40ebc8fe8b2e', 'ad883531-0876-40ea-be44-895e32b3c71c', '221a6d0e-b7c4-4840-b2d4-d6d9faece19d', '0830ceb6-7786-47e7-b451-ec7ae1bfa849', '1e4d9c25-ecb9-4ec9-a02a-2bdf88eafa7d', '4bf7845b-b60f-4827-b20d-ff35625710d1', '5fd1d974-e2c2-4076-8877-7bc44ad0fe2e');
Putting couple additional records that need fixing here. Can create a new ticket if preferred. in00000048016 in00000047295 in00000047284
Actually I'm wondering if we should use the metadb query from Ryan to get a list of others that should be fixed in this way? If that makes sense, I can run that query for most recent list of instance hrids
I'm not sure which query from Ryan you are referring to.
I fixed these 3 you listed above.
New examples will be ticketed. May add to list of jobs that run periodically. #611
see slack thread: https://stanfordlib.slack.com/archives/C03MB1KHR89/p1705960705553329