surfedushare / pol-harvester

A repository that harvests different sources for content
2 stars 0 forks source link

Migrate references to the new document references #95

Closed jelmerderonde closed 4 years ago

jelmerderonde commented 4 years ago
fako commented 4 years ago

Created: data/pol.2019-11-05.postgres.sql

fako commented 4 years ago

There are 361 Annotations a 193 do not have Documents for themselves in the beta freeze. Only 79 Documents can be recovered from Arrangements, which leaves 114 Annotations without any documents attached.

fako commented 4 years ago

Researching this further it turns out that for instance the material with reference 153673415b8cbbd48b7c5f8e4f1850b616e1dce0 is missing in the beta freeze. The title of that material is "Integreren met de Monte Carlo-methode". However if we search Edurep for that material: http://wszoeken.edurep.kennisnet.nl:8000/edurep/sruns?version=1.2&operation=searchRetrieve&query=lom.general.title=%22Integreren%20met%20de%20Monte%20Carlo-methode%22 We get exactly one material where the repository has not been set. That way it fails to show up in our API harvest. That is regardless of the Freeze that we're using. If the material is no longer in a repository that interests us, it is not going to be picked up by the harvester.

jelmerderonde commented 4 years ago

Throw away annotations for materials that no longer exist in beta freeze. For rankings: throw away the instances for expected documents no longer availaible, but keep overall queries.