Closed thisismattmiller closed 8 years ago
https://github.com/nypl-registry/registry-ingest/blob/5e62117bddd756ea272f49d99c608c5b6d4563f0/lib/serialize_utils.js#L279 This job looks for records with the same normalized name and then does its best to see if they should be merged together, if they are merged it adds all the viaf and normalizedNames into the best (most complete) record, the second pass of resources serialization will then only use the new merged record
clusterByName | totalAgents: 4225784 totalDeleted: 11810
Post serialization it should be possible to cluster terms together based on their shared normalized values