hbz / lobid-gnd

UI and API to the Integrated Authority File (Gemeinsame Normdatei, GND)
http://lobid.org/gnd
Eclipse Public License 2.0
25 stars 5 forks source link

Remove deleted entries from index #159

Open acka47 opened 6 years ago

acka47 commented 6 years ago

There happen several deletions without redirect to an existing entry in the GND. Here are the numbers for the last months provied by S. Hartmann in http://jira.dnb.de/browse/GND-63:

09.2018: 123 GND-Datensätze 08.2018: 89 GND-Datensätze 07.2018: 41 GND-Datensätze 06.2018: 83 GND-Datensätze 05.2018: 104 GND-Datensätze 04.2018: 80 GND-Datensätze

These are only removed when building a whole new index with a new GND dump. When updating the data on a day-to-day basis the deleted entries aren't removed. We should see how we get the information on deleted entries via OAI-MPH and remove deleted entries with each update.

acka47 commented 5 years ago

OAI-PMH & deletions: https://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords

fsteeg commented 5 years ago

The DNB repository does not seem to provide that information. It declares its level of support for deletions as transient, which means "the repository does not guarantee that a list of deletions is maintained persistently or consistently" (see https://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords).

The response header information does not contain the optional status attribute (see https://www.openarchives.org/OAI/openarchivesprotocol.html#header), they all look like this:

<header>
  <identifier>oai:dnb.de/authorities/000460265</identifier>
  <datestamp>2018-12-20T04:38:17Z</datestamp>
  <setSpec>authorities</setSpec>
</header>

Maybe the info is available in other formats like MARC21-xml or PicaPlus-xml? But we can't get these, only RDFxml works, others give 403 (Forbidden). @acka47 maybe this is something to bring up in the GND dev expert group?

acka47 commented 5 years ago

It is probably the best approach to open an issue in the DNB Jira where we ask for support of deletions via OAI-PMH. I will do this.

acka47 commented 5 years ago

The Jira issue is at https://jira.dnb.de/browse/GND-77 (login required).

acka47 commented 3 years ago

There is an update on the Jira issue which reads:

Wird mit dem nächsten Release 2021.03 realisiert. Vorabankündigung mit den notwendigen Informationen kommt am 28.6.2021.

acka47 commented 3 years ago

From Metadatendienste: Änderungen im Format RDF ab 28. September 2021(Export-Release 2021.03):

Mit Release 2021_03 wird es nun möglich, Aussagen über gelöschte Datensätze in der GND über die Schnittstellen (OAI- bzw. SRU-Schnittstelle)10 zu erhalten. Hierfür wurde die neue Klasse „dnbt:DeletedResource“ eingeführt.Beispiel:

<rdf:Description rdf:about="https://d-nb.info/gnd/1109770197">
  <rdf:type rdf:resource= "https://d-nb.info/standards/elementset/dnb#DeletedResource"/>
</rdf:Description>

The release will drop on 2021-09-28. Already assigning @fsteeg but leaving the issue in backlog.

acka47 commented 3 years ago

The Jira issue is at https://jira.dnb.de/browse/GND-77 (login required).

This issue was just closed with this comment:

Löschungen werden nun über OAI kommuniziert.

Bsp.: GET https://services.dnb.de/oai/repository?verb=GetRecord&metadataPrefix=RDFxml&identifier=oai:dnb.de/authorities/1231757663

<rdf:Description rdf:about="https://d-nb.info/gnd/1231757663">
    <rdf:type rdf:resource="https://d-nb.info/standards/elementset/dnb#DeletedResource"/>
</rdf:Description>