Open dr0i opened 9 years ago
With https://github.com/hbz/lobid-resources/commit/26e4a0656ada154845a45f4706c6264f0b48ec6c#diff-070376a28f971f006644814c8b3860ec (enrichment of wikidata geo data) comes the enrich
-method in ElasticsearchIndexer.java
: a lookup in an other index is done and the result merged into the ETL-result of the hbz01 (thus it becomes the new lobid-resource). The idea is to not make many lookups for enrichments (3 alone mentioned in this issue) but to have ONE parallel enrichment
-index: every lobid-resource would make ONE lookup and merge the result. The enrichment
-index would be updated independently of the indexing of lobid-resources and thus could take all the time it needs to be build. Also, most of the time there will be only a few updates in the enrichment index at all. So: when doing a fulldump-reindexing: only one lookup on a preprocessed enrichment index , while the update of that enrichment index will be a) independently of lobid-resources and b) even if it's an aggregated index: not much changes expected.
Re last comment, in short: the mentioned enrichment-index
would be the entityfacts
for hbz01-lobid-resources.
the mentioned
enrichment-index
would be theentityfacts
for hbz01-lobid-resources.
Why? I think we just need EntityFacts for lobid-gnd. For NWBib and probably also lobid-resources we will load EntityFacts data on the fly, see https://github.com/hbz/nwbib/issues/427.
Entityfacts just enriches the gnd, but not lobid-resources (e.g. books) . With "entityfacts for hbz01-lobid-resources" I don't mean to index entityfacts into lobid-resources but to build "something-like-it" for our catalogs entries (lobid-resources).
With the new way of getting transforming the data ( without using hadoop , s. hbz/lobid#139) we lost our enrichment to
[x] dewey labels (with hbz/lobid-resources#581)
This must now be done in another way.