hbz / lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD
http://lobid.org/resources
Eclipse Public License 2.0
8 stars 7 forks source link

Enrich with RVK based on Culturegraph #1058

Closed dr0i closed 2 months ago

dr0i commented 4 years ago

From our appointment on 13th February 2020. @hagbeck's workflow (which we shall implement at lobid):

See https://katalog.ub.tu-dortmund.de/taxonomy/tree for Dortmund's enrichment.

Example in culturegraph, see field 084.

Download CG data at : https://data.dnb.de/culturegraph/ , atm aggregate_20240507.marcxml.gz

TobiasNx commented 2 years ago

@dr0i should we implement this for ALMA too?

dr0i commented 2 years ago

Yeah I definitely have this on my mind ! :) (and we do it only with ALMA and Fix)

TobiasNx commented 1 year ago

@dr0i I added an shortend approach for this in fix: https://github.com/metafacture/metafacture-examples/pull/8

dr0i commented 7 months ago

Pinging @hagbeck and @blackwinter : we are considering to go on with this issue. We also consider to provide labels (#1835). Do you actually would use these labels?

blackwinter commented 7 months ago

Yes, we would certainly like to use the label. We're doing it already in IntrOX (by converting the CSV to an LMDB).

hagbeck commented 7 months ago

Sorry, I've overseen your ping. Yes, we would like to use the labels, too.

dr0i commented 5 months ago

So, we now can extract an RVK-almaMmsId concordance as csv. Resulting size is ~300MB and has 6.658.601entries - which sounds amazing :) Now we want to enrich our data with this concordance, i.e. lookup the concordance and add the data to a field . (Btw , how would we name that field? Analog to Sachgruppen :

{  "subject" : [ {
    "notation" : "333.7",
    "type" : [ "Concept" ],
    "source" : {
      "label" : "DDC-Sachgruppen der ZDB"
    },
    "label" : "Natürliche Ressourcen, Energie und Umwelt"
  }
]
}

?)

TobiasNx commented 5 months ago

There are already RVK subjects, but not for all.

Have a look here: https://github.com/hbz/lobid-resources/blob/c7cecef1e36953bf5b8a5c5604228ba31b2e8e08/src/test/resources/alma-fix/990050000600206441.json#L94-L129

dr0i commented 5 months ago

Check next Monday. Also, at some point we need to update the data automatically. Should be sufficient to get the data once a month from https://data.dnb.de/culturegraph/ .

dr0i commented 4 months ago

Seems good, check e.g. https://lobid.org/resources/990062574560206441. See all resources having a RVK notation (6.8 M): https://lobid.org/resources/search?q=subject.source.id%3A%22https%3A%2F%2Fd-nb.info%2Fgnd%2F4449787-8%22

acka47 commented 3 months ago

update lookup table once a month based on new data from https://data.dnb.de/culturegraph/

This to do is still open, @dr0i .

TobiasNx commented 2 months ago

I thought about this today. Does it make sense to mark the enriched rvk elements with a version property, even if it is not really correct?

e.g.

"subject":[
   {
      "notation":"SK 110",
      "type":[
         "Concept"
      ],
      "version":"enrichment",
      "source":{
         "label":"RVK (Regensburger Verbundklassifikation)",
         "id":"https://d-nb.info/gnd/4449787-8"
      }
   },
acka47 commented 2 months ago

I thought about this today. Does it make sense to mark the enriched rvk elements with a version property, even if it is not really correct?

We have already discussed this and addressed the question in the blog post: https://blog.lobid.org/2024/07/04/rvk-enrichment.html#verzicht-auf-provenienzangaben Until now, nobody asked for a marker, so that I'd say we don't need it. We can re-evaluate if things change.

dr0i commented 2 months ago

To be checked after 11th September (second Wednesday in a month).

dr0i commented 2 months ago

The monthly update looks good:

~/git/lobid-resources-alma$ ls -hal lookup-tables/data/rvk* -rw-rw-r-- 1 sol sol 258M Sep 11 13:34 lookup-tables/data/rvk.tsv -rw-rw-r-- 1 sol sol 253M Jul 1 12:46 lookup-tables/data/rvk.tsv.20240507

Note that rvk.tsv.20240507 is not an automatically generated backup - there is no backup generated. It's just convenient to have something to compare to (and possibly a quick fallback to some state if really needed).

Closing.

blackwinter commented 2 months ago

Do I understand correctly that this is in a local directory named lookup-tables, not in the lookup-tables repository?

dr0i commented 2 months ago

yep -it's locally.