Closed MansMeg closed 6 months ago
Please let me know if we can test it and give feedback
Feedback Wikidata Telegram group about CC-0 licenses Egon
yes, DrugBank does this: ID mappings are CCZero, rest has different license: https://go.drugbank.com/data_packages and CC0 part here: https://go.drugbank.com/releases/latest#open-data
Also see #384
Should mpdb rather be politicians-db
or similar due to the inclusion of ministers?
Thats a good point. Im not sure about the exact names. Politicians sounds too generic to since its only members of parliament, guests speaking in parliament and ministers. Maybe we could check with political scientists?
too generic to since its only members of parliament, guests speaking in parliament and ministers.
riksdag_politicians
Still kind of generic to me? Maybe send a quick email to Jan, Josefina and Cecilia if they have a good suggestion?
Being more specific will make it less clear for the end users. Even mpdb was kinda obscure
@BobBorges is this done?
@MansMeg
Before releasing version 1.0, we need to polish the corpus API to make it more intuitive and easy for users to use the data. Currently, the repo has a lot of legacy content. Below are the structure after discussions.
First, we split up the corpus into separate components. The components we be the document types:
riksdag_records
riksdag_records-alto
riksdag_records-pdf
riksdag_motions
riksdag_motions-alto
riksdag_motions-pdf ...
riksdag_mpdb (mp database)
rpackage
pylib
and more internal (but still public)
Private repos:
In the data repositories, the folder structure will be the same: /data/... -> the data /test/... -> data integrity tests of the specific data /test/data/... -> data used by data integrity tests /quality_estimation/... -> scripts used for quality estimation /quality_estimation/data/... -> data used for quality estimation README.md
Reasoning R-package and python library are separate repositories and eventual use of the corpus could point to these repos for in-depth details on how to use the data. E.g. R uses vignettes, python uses examples.
Many users might be interested in just one or two of the repos. At the same time the repos might be very large. So we should simplify specialized use.
Supplementary material should only be used temporally and not part of the API. We should use different repos for different types of training data and not all in one repo.
Some additional thoughts: