welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Decision: How to store quality assesment data and unit test data #384

Closed MansMeg closed 7 months ago

MansMeg commented 11 months ago

Me and @BobBorges had a discussion on how to structure the corpus in the best way and also to clarify the API. We have three types of data, as I see it:

  1. The corpus data (i.e. the data end users will use)
  2. The quality assesment/error estimation data. Thats the gold standard data we only use to estimate the quality dimensions.
  3. The data integrity test data. that is data we only use for unit test.
  4. Various other dat (training sets etc)

The questions is how to store this in the corpus in a good way, since that is also show how we think about the API.

My gut feeling is that neither 2 nor 3 is data that is of general interest and should not be part of the API.

The question is then where to put this data.

Potential solutions:

  1. is put in /quality/error_estimation
  2. is put in test/data
  3. Is put in a separate repo

What do you think?