Open dannymandel opened 10 months ago
Generally we should use whatever is exposed by the repository when requesting a record by its identifier.
I suspect the mix of formats may be an oversight on OC (perhaps a cache issue?), but I think if there's agreement on a new format then that should be the one we use. Any records not conforming to the new format should be treated as invalid by iSamples (if they have not already been harvested).
So basically:
for pid in get_pids_from_oc:
if pid in isamples:
if record has not changed:
continue
get record
if record is valid:
insert or update record in isamples
We should refetch all the records, preferably after Eric makes a new iSamples specific API.
Keeping open as a task, the task is to refetch all the OpenContext records and reindex them on the iSamples side.
The OpenContext JSON format has changed, but not all of the records we have are in the new format. For example, here's one in the old format:
and one in the new:
notice the difference between the
Creator
values. There is this methodwhich should probably help. However, there are things like
keywords
where we are using the new Getty metadata in the OC API that just isn't present for the older records. If we refetch everything will it be available? Should we just no-op on the older records? Should we keep two copies of the Transformer around?@ekansa @datadavev need some guidance here.