lobid / lodmill

This repo is replaced by i.a. https://github.com/hbz/lobid-resources/
19 stars 8 forks source link

Enrichment OpenLibrary, Gutenberg, dbpedia (and maybe dewey labels) #667

Open dr0i opened 9 years ago

dr0i commented 9 years ago

With the new way of getting transforming the data ( without using hadoop , s. hbz/lobid#139) we lost our enrichment to

dr0i commented 6 years ago

With https://github.com/hbz/lobid-resources/commit/26e4a0656ada154845a45f4706c6264f0b48ec6c#diff-070376a28f971f006644814c8b3860ec (enrichment of wikidata geo data) comes the enrich-method in ElasticsearchIndexer.java: a lookup in an other index is done and the result merged into the ETL-result of the hbz01 (thus it becomes the new lobid-resource). The idea is to not make many lookups for enrichments (3 alone mentioned in this issue) but to have ONE parallel enrichment-index: every lobid-resource would make ONE lookup and merge the result. The enrichment-index would be updated independently of the indexing of lobid-resources and thus could take all the time it needs to be build. Also, most of the time there will be only a few updates in the enrichment index at all. So: when doing a fulldump-reindexing: only one lookup on a preprocessed enrichment index , while the update of that enrichment index will be a) independently of lobid-resources and b) even if it's an aggregated index: not much changes expected.

dr0i commented 6 years ago

Re last comment, in short: the mentioned enrichment-index would be the entityfacts for hbz01-lobid-resources.

acka47 commented 6 years ago

the mentioned enrichment-index would be the entityfacts for hbz01-lobid-resources.

Why? I think we just need EntityFacts for lobid-gnd. For NWBib and probably also lobid-resources we will load EntityFacts data on the fly, see https://github.com/hbz/nwbib/issues/427.

dr0i commented 6 years ago

Entityfacts just enriches the gnd, but not lobid-resources (e.g. books) . With "entityfacts for hbz01-lobid-resources" I don't mean to index entityfacts into lobid-resources but to build "something-like-it" for our catalogs entries (lobid-resources).