Closed dr0i closed 2 months ago
@dr0i should we implement this for ALMA too?
Yeah I definitely have this on my mind ! :) (and we do it only with ALMA and Fix)
@dr0i I added an shortend approach for this in fix: https://github.com/metafacture/metafacture-examples/pull/8
Pinging @hagbeck and @blackwinter : we are considering to go on with this issue. We also consider to provide labels (#1835). Do you actually would use these labels?
Yes, we would certainly like to use the label. We're doing it already in IntrOX (by converting the CSV to an LMDB).
Sorry, I've overseen your ping. Yes, we would like to use the labels, too.
So, we now can extract an RVK-almaMmsId concordance as csv
. Resulting size is ~300MB and has 6.658.601
entries - which sounds amazing :)
Now we want to enrich our data with this concordance, i.e. lookup the concordance and add the data to a field . (Btw , how would we name that field? Analog to Sachgruppen
:
{ "subject" : [ {
"notation" : "333.7",
"type" : [ "Concept" ],
"source" : {
"label" : "DDC-Sachgruppen der ZDB"
},
"label" : "Natürliche Ressourcen, Energie und Umwelt"
}
]
}
?)
There are already RVK subjects, but not for all.
Have a look here: https://github.com/hbz/lobid-resources/blob/c7cecef1e36953bf5b8a5c5604228ba31b2e8e08/src/test/resources/alma-fix/990050000600206441.json#L94-L129
Check next Monday. Also, at some point we need to update the data automatically. Should be sufficient to get the data once a month from https://data.dnb.de/culturegraph/ .
Seems good, check e.g. https://lobid.org/resources/990062574560206441. See all resources having a RVK notation (6.8 M): https://lobid.org/resources/search?q=subject.source.id%3A%22https%3A%2F%2Fd-nb.info%2Fgnd%2F4449787-8%22
update lookup table once a month based on new data from https://data.dnb.de/culturegraph/
This to do is still open, @dr0i .
I thought about this today. Does it make sense to mark the enriched rvk elements with a version property, even if it is not really correct?
e.g.
"subject":[
{
"notation":"SK 110",
"type":[
"Concept"
],
"version":"enrichment",
"source":{
"label":"RVK (Regensburger Verbundklassifikation)",
"id":"https://d-nb.info/gnd/4449787-8"
}
},
I thought about this today. Does it make sense to mark the enriched rvk elements with a version property, even if it is not really correct?
We have already discussed this and addressed the question in the blog post: https://blog.lobid.org/2024/07/04/rvk-enrichment.html#verzicht-auf-provenienzangaben Until now, nobody asked for a marker, so that I'd say we don't need it. We can re-evaluate if things change.
To be checked after 11th September (second Wednesday in a month).
The monthly update looks good:
~/git/lobid-resources-alma$ ls -hal lookup-tables/data/rvk* -rw-rw-r-- 1 sol sol 258M Sep 11 13:34 lookup-tables/data/rvk.tsv -rw-rw-r-- 1 sol sol 253M Jul 1 12:46 lookup-tables/data/rvk.tsv.20240507
Note that rvk.tsv.20240507
is not an automatically generated backup - there is no backup generated. It's just convenient to have something to compare to (and possibly a quick fallback to some state if really needed).
Closing.
Do I understand correctly that this is in a local directory named lookup-tables
, not in the lookup-tables repository?
yep -it's locally.
From our appointment on 13th February 2020. @hagbeck's workflow (which we shall implement at lobid):
See https://katalog.ub.tu-dortmund.de/taxonomy/tree for Dortmund's enrichment.
Example in culturegraph, see field
084
.Download CG data at : https://data.dnb.de/culturegraph/ , atm
aggregate_20240507.marcxml.gz