lobid / lodmill

This repo is replaced by i.a. https://github.com/hbz/lobid-resources/
19 stars 8 forks source link

Organisation data entries with identical id #627

Closed SBRitter closed 9 years ago

SBRitter commented 9 years ago

There are some organisations that possess an inr and an isil in their DBS entry but only an isil in their Sigel entry. Because the merging of entries is done on the basis of inr, these entries are not merged. Thus, the organisation will exist twice in the data, as do many organisations. However, in this particular case both entries will recieve their isil (exists in both data sources -- Sigel and DBS) as their id for Elasticsearch indexing. As a consequence, there will be two entries with identical ids before indexing. During indexing, the second entry with this id will overwrite the first. The order of the entries is determined by the internal id in Metafacture -- i.e. the inr or the isil. At the moment, there are around 150 duplicate entries of this kind (about 0.5% of the data).

dr0i commented 9 years ago

Moved to https://github.com/hbz/lobid-organisations/issues/24.