ycba-cia / blacklight-collections2

5 stars 2 forks source link

Deletion process #184

Open edgartdata opened 4 years ago

edgartdata commented 4 years ago

We agreed to a quarterly purge of records and report.

Records need to be deleted from back end systems first, such as TMS and Voyager, and then in the deletion app.

records suppressed vs. deleted in back end systems.

For TMS suppressed records, @robl and @edgartdata will check with David Parsell that COBOAT does not vend the full XML description to BlackLight.

yulgit1 commented 4 years ago

The indexing with the new Currently On View/Access field ran over the weekend (Issue #179).

There are some discrepancies in the numbers looking at the difference of total #s for collections and sum of the new Currently On View/Access field:

collection_ss:"Reference Library" && -detailed_onview_ss:["" TO ] 37774-37569=205 collection_ss:"Prints and Drawings" && -detailed_onview_ss:["" TO ] 55259-55249+4=6 collection_ss:”Rare Books and Manuscripts“ && -detailed_onview_ss:["" TO *] 20314-20273=41

On sampling it looks like the ones that are missing have been removed, and this can be fixed with the proposed quarterly purge.

Does this make sense? @KraigBinkowski @flapka If not I can send a list of the differences to check.

flapka commented 4 years ago

@yulgit1 I think that makes sense, yes.

yulgit1 commented 4 years ago

Thinking about it more, a purge is not the best idea as that would mean downtime for several hours during the repopulation. Perhaps instead a quarterly delete of "stale" objects (stale meaning objects with timestamps that weren't updated in the past week)?

flapka commented 4 years ago

If this leads to the same end, it's fine with me.

edgartdata commented 4 years ago

@yulgit1 Can I please get the list of differences for P&D? I assume there was no problem coming from P&S?

yulgit1 commented 4 years ago

Yes P&S up to date, see attached, all 6 unindexed since last may.

https://git.yale.edu/gist/ermadmix/a2f1196845d92233a88e1888a80c40e5

yulgit1 commented 4 years ago

@flapka see: https://git.yale.edu/gist/ermadmix/5185d9b219091cf7b5c6327acac2aa38 @KraigBinkowski see: https://git.yale.edu/gist/ermadmix/64dfa0e9d1e39db7048bcf20ef0ff518

The objects listed in these would be deleted as the most recent weekly indexing based on the polling of the harvester did not include these, for example, these are in blacklight but no longer in the harvester, so would be deleted from blacklight:

https://libapp.library.yale.edu/OAI_BAC/src/OAIOrbisTool.jsp?verb=GetRecord&identifier=oai:orbis.library.yale.edu:582943&metadataPrefix=marc21

https://libapp.library.yale.edu/OAI_BAC/src/OAIOrbisTool.jsp?verb=GetRecord&identifier=oai:orbis.library.yale.edu:9792086&metadataPrefix=marc21

Let me know if this is OK and I will create the process.

flapka commented 4 years ago

@yulgit1 Yes, that list for RB items looks correct. Some on the list are items that we've temporarily suppressed -- to be un-suppressed before long. When we un-suppress them, do you expect any complications in terms of harvesting etc.?

yulgit1 commented 4 years ago

@flapka - no, should be ok, when unsuppressed will appear in libapp and get ingested

yulgit1 commented 4 years ago

periodic use of delete_stale.rb script and script with cia-solr app to batch remove items

cia-solr app for one off deletes

edgartdata commented 3 years ago

@yulgit1 @flapka @KraigBinkowski Let's set a schedule for deletions from BL in our calendars. For this short list attached here, let's aim for after September 6th. to_delete_aug19_2021.txt