Open shelleydoljack opened 9 months ago
Below is a MetaDB query to pull instance hrids lacking SRS links. Current count: 448
select
it.hrid
from
folio_inventory.instance__t__ it
where
it."__current"
and it."source" = 'MARC'
except
select
rl.external_hrid as hrid
from
folio_source_record.records_lb__ rl
where
rl."__current"
and rl.record_type = 'MARC_BIB'
and rl.state = 'ACTUAL'
order by
hrid asc;
The current issue we are finding is not that the SRS in marc_records_lb or raw_records_lb is missing the 999$i
but rather that the records_lb table does not have state=ACTUAL given a distinct external_id or external_hrid (they all are state=OLD and this seems to be the root of the problem with "View Source" greyed out or DI to overlay not working).
select count(distinct external_hrid) from sul_mod_source_record_storage.records_lb where state = 'ACTUAL';
count
----------
11311539
select count(distinct external_id) from sul_mod_source_record_storage.records_lb where state = 'ACTUAL';
count
----------
11311539
select count(distinct external_hrid) from sul_mod_source_record_storage.records_lb where state = 'OLD';
count
--------
123600
FOLIO inventory facet source=MARC shows 10,349,398. Not sure why we have 1 million more with state = ACTUAL. Maybe the elasticsearch index is off?
select count(*) from sul_mod_inventory_storage.instance where jsonb ->> 'source' = 'MARC';
count
----------
10349812
So we are potentially missing 414 records?
Trying a query that would give us instance records with source = MARC but there is no SRS yields 62 results. @ahafele should these have SRS (I see some are on-order but then some look wrong maybe, like 1e5926c8-a110-417f-8c59-c5c1cb393880
):
instance_id | instance_source | srs_state
--------------------------------------+-----------------+-----------
ea09b1e1-b549-43a0-8f65-975d2eec6c74 | MARC |
92090026-18e8-4d04-872f-d97b7b425cd3 | MARC |
b0ab8998-90bc-4353-835b-4376507e0500 | MARC |
ccc7da59-17ac-4934-8b63-4ca8917c5906 | MARC |
992857ff-3776-4fd6-a08e-3fe48ab770bf | MARC |
a274bb3a-62e9-4ee4-9562-1e27943012d8 | MARC |
4bf0cb67-ff8b-4c4b-94d9-f2db96181ae5 | MARC |
01ee2d3d-564e-45a7-a725-ae9e7c49c493 | MARC |
bb9d9610-829b-4cdb-b3fb-f8c252dd3c19 | MARC |
751ca890-5fac-416f-9465-41067f6ed9fb | MARC |
76b7eebd-2b8c-4192-aa6e-1ff9a084ed20 | MARC |
ff8720de-3e30-4871-a6eb-64bab5da24be | MARC |
5e7aad1c-0438-4819-aee1-a053ee66fb7f | MARC |
678316b9-ccb6-40f1-a09f-7235589d4dc6 | MARC |
81ec3137-2a07-4d49-9e06-cb26d0a5f712 | MARC |
8d132390-0b26-4760-955d-6d3865cf2c35 | MARC |
1e5926c8-a110-417f-8c59-c5c1cb393880 | MARC |
c678a358-417f-4914-8ca4-500429778e08 | MARC |
3d32fe56-69bb-4feb-9608-11972bea0961 | MARC |
63996a33-ab64-4357-acb7-8d0a421bcb30 | MARC |
da983b6e-464c-4320-a295-c8133384ba14 | MARC |
f6ee75af-924f-4b4a-b844-8bfdcdd2fd7b | MARC |
8a6b3dbf-8b5a-43ba-8ca3-dc89b42c0796 | MARC |
2e467c2c-860d-4975-812b-b41661c3667b | MARC |
bfeb3491-d969-4f1e-aad9-2ff5e8ca6cbb | MARC |
7a650ce0-79f4-4121-9167-3439852c67eb | MARC |
5cb35b5f-bc4d-45e8-8378-c47840126ac8 | MARC |
22a7e3a9-24a2-43cb-8a9b-32849b6348f2 | MARC |
5924fda2-8340-4d36-b10f-db4e7082f4d9 | MARC |
ba16ca75-01c7-4538-bc64-6f655a34b4e0 | MARC |
ac862dd6-bd2c-4e78-969f-64daace1cfa0 | MARC |
9988eb9d-cf0a-4142-b049-a50bbe4d3b5a | MARC |
550cc36b-d4fd-4b58-8f42-4649b8822f85 | MARC |
5e672256-8553-467d-9905-62fc0a66128f | MARC |
5093c672-d681-4215-9240-04f23ae6285d | MARC |
80baf139-2d33-444a-b954-07e23f75223d | MARC |
ec30ef0f-e75f-423d-a6d6-b1e9f5be70ad | MARC |
410eae34-13e0-4545-98fa-810faf83dffa | MARC |
5723a170-b33e-4944-b483-c74d14e04664 | MARC |
4d46f1bc-0bc5-4600-b3d5-ec8809ddb2ff | MARC |
2dce8a79-7f55-40e9-962c-eaa7bf56cb06 | MARC |
a3e044c5-3508-4a40-8f4b-c08c5595547a | MARC |
f779e02d-01d5-4bc4-b273-cd6c0089f1e2 | MARC |
703bcaff-e8a4-48f9-8cf0-269bdc1c6a4c | MARC |
7cf3a7df-9f48-4979-b7dc-2a628aa4a822 | MARC |
06eccb3d-e7c8-46ae-9abf-ebb0af350881 | MARC |
3dbe71c4-a842-47a4-b0f8-da6d54aad418 | MARC |
29a46e65-98f4-4934-bb8f-863f24a3251d | MARC |
2903e1ef-117c-43b6-9537-0aa2fbdbd822 | MARC |
3f3668fa-cf06-4e68-95c2-6981f1e19db4 | MARC |
9c23ec76-9ee2-4165-bea8-a36ce47885ce | MARC |
ded0b8b3-3ccb-4eda-841f-2dd7f2a07ae0 | MARC |
979a6690-69cd-4a1e-908d-47a31fe25337 | MARC |
1166da9c-77a9-4500-badf-d1cadafb64e1 | MARC |
409ec1b5-f1c0-4d83-a66b-aded85dd3353 | MARC |
11bf23c4-82a8-4466-96cf-993a81310a3c | MARC |
993bd7c3-ac46-49da-bfed-0413ec18f09e | MARC |
4dfcc767-5006-4ca4-9805-c0f46beacfce | MARC |
d437ff7e-f9e9-4bd2-be5d-c23feaedbc66 | MARC |
510fa7ff-24a0-4e0a-b16b-04d65b3f376a | MARC |
7f7cf99b-26b9-45d6-a160-9647d6ab4f42 | MARC |
6cc416ea-1d4a-4bf0-b504-ab9afd34de68 | MARC |
This is the query:
select instance.id as instance_id,
jsonb_extract_path_text(instance.jsonb, 'source') as instance_source,
srs.state as srs_state
from sul_mod_inventory_storage.instance as instance
left join sul_mod_source_record_storage.records_lb as srs on srs.external_id = instance.id
where jsonb_extract_path_text(instance.jsonb, 'source') = 'MARC'
and srs.state is null;
I spot checked some of these and many are Springer vendor records e.g. the example you give above (1e5926c8-a110-417f-8c59-c5c1cb393880) - see this ticket for related information. We could have someone check all of these but my guess is that they should all have SRS records.
instances_missing_SRS.json I grabbed the instance json for each of these in case it is helpful.
Probably add to the monthly "run these tasks" DAG (to be coded).
On single record imports, something happens and the instance is source=MARC but the SRS is missing or not connected to the instance anymore. Current solution is to change source=FOLIO for instance and have user overlay source bib to get it right. Document of missing SRS instances: https://docs.google.com/document/d/1_O1_Hx44q7o9Qmm_HvF7DNugAcDzUMZ8GYIEuT2v0zI/edit?usp=sharing
Scripts 2. Scripts for identifying invalid InstanceIds in SRS and 10. Scripts to remove SRS MARC Bib records with no InstanceIds from the wiki Scripts for Inventory, Source Record Storage, and Data Import Cleanup have some solutions that we might want to replicate in a DAG.