Closed ahafele closed 1 month ago
DAG is failing on transform step - https://sul-libsys-airflow-dev.stanford.edu/dags/select_pod_records/grid?run_id=manual__2024-05-10T15%3A34%3A09%2B00%3A00&execution_date=2024-05-10+15%3A34%3A09%2B00%3A00&tab=graph&dag_run_id=manual__2024-05-10T15%3A34%3A09%2B00%3A00&task_id=transform_folio_marc_record I noticed in the log for this error it says
INFO - Writing 20 modified MARC records to /opt/airflow/data-export-files/pod/marc-files/updates/202405081947.mrc
this filename is for 0508 but I tried to run this today 0510
@jgreben @jermnelson I am seeing 999 data now, but the records seem to be written all to the same file. When I run the select_pod dag no new file is generated for download, but this file grows each time 202405071925.mrc. The log for transform marc references this filename.
999 - All subfields are not present but the 999s seem to be duplicated in each record. Maybe due to how the files aren't being written correctly.
example: This record should have 3 999s (1 folio and 2 with holdings/item data) but instead has
=999 ff$i5881758a-0a68-57f0-8a63-f591de7159dc$sb0b6df22-0574-5519-9c3d-1f60f59706a9 =999 \$aG6071 .P2 1951 .J6 NORTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 SOUTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 NORTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 SOUTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 NORTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 SOUTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 NORTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 SOUTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 NORTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 SOUTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 NORTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1 =999 \$aG6071 .P2 1951 .J6 SOUTHERN SHEET$hMap$lEAR-MAP-CASES$wLibrary of Congress classification$tbook$eEAR-MAP-CASES$j1
@ahafele I think what is going on is that all of the marc files in the various vendor folders are getting re-transformed for each new dag run because the transmission tasks are not running and the files are not being archived from the vendor/marc-files directories. I just looked at a couple of files from dags that I triggered just now: west/updates/202405131852.mrc
and pod/updates/202405131910.mrc
and it looks like there are no duplicated 999s. Only the previous files have duplicates. Can you verify?
@jgreben I just triggered a POD dag run and had the same result (records written to the 0507 file). Looking at the file you reference - pod/updates/202405131910.mrc - the 999s are looking better but the records are duplicated x3 in the file.
Also 999 for records with just a holdings record and no item record are not being generated correctly. Req is
1 new 999 for each holdings/item combo If no item records, 999 with holdings data only
Ran a test for POD selection and I would have expected to find the following in the update file -
a3929970 - updated date for today - nothing suppressed
a4798645 - updated date for today - has print holdings and no item
a6743952 - holdings suppressed but not item
All three include symphony 999s
We figured out that the SQL was still not correct after the last change of looking for the updatedDate from the instance record. We needed to stop looking for a "marc generation of greater then 0" because in these cases the marc record was not updated at all.
Now fixed by #1006
Testing results:
a3929970
- updated date for today - nothing suppressed Now found in file
a4798645
- updated date for today - has print holdings and no item Still not in file. Electronic records with holdings and no items are included so I'm not sure why this one is not
a6743952
- holdings suppressed but not item still not in file but I think that is good
I did another test of a record with print holdings and no item but this time with a bound-with relationship and it works - a6829669
- presumably because of the associated principle item record. I'd still like to understand how this is working but I think it might be OK.
I am still seeing the updatedDate for those HRIDs still as 5/20/2024, and the parameters for the latest pod run was {'from_date': '2024-05-22', 'to_date': '2024-05-23'}
Maybe what you did to update the record somehow didn't take, or did not effect the updatedDate for the instance record?
I changed the instance updateDate for a4798645 and a6743952 to today 5/23 and reran the POD selection dag and the generated update file is empty.
This should be fixed now by https://github.com/sul-dlss/libsys-airflow/pull/1016
Fixes
I changed the instance updateDate for a4798645 and a6743952 to today 5/23 and reran the POD selection dag and the generated update file is empty.
New problem When an item is suppressed the 999 with item info is still included. When an item and holding is suppressed the 999 is still included. This was previously working as expected.
Example a9615278
includes
=999 \$aHC285 .C27 2011 test$hBook$lGRE-STACKS$wLibrary of Congress classification$tportable device 1$eGRE-STACKS
=999 \$aHC285 .C27 2011$hBook$lGRE-STACKS$wLibrary of Congress classification$tbook$eGRE-STACKS$j1
even though the first one is suppressed from discovery.
Example
a7946988
includes
=999 \$aHV6773 .C475 2009$hBook$lGRE-STACKS$wLibrary of Congress classification$tbook$eGRE-STACKS$j1
Even though both item and holding are suppressed from discovery.
@jgreben most 999s are now gone from the generated files. I can show after standup this morning.
I ran the select_pod_records DAG with the following results
[ ] no holdings/item data in the 999 (see here for 999 details needed)
[x] all records in the file are suppressed from discovery at the instance level. When I edit the record in folio to uncheck the suppress from discovery box (e.g. unsuppress the instance) and rerun the dag that record is no longer in the file. UUID - 21fa6875-3d81-5e57-ae09-058b05a20a6e Update Josh fixed the above selection issue and suppressed records will be pulled out of file and put in separate delete file/list. Per ticket https://github.com/sul-dlss/libsys-airflow/issues/955