Me and @BobBorges had a discussion on how to structure the corpus in the best way and also to clarify the API. We have three types of data, as I see it:
The corpus data (i.e. the data end users will use)
The quality assesment/error estimation data. Thats the gold standard data we only use to estimate the quality dimensions.
The data integrity test data. that is data we only use for unit test.
Various other dat (training sets etc)
The questions is how to store this in the corpus in a good way, since that is also show how we think about the API.
My gut feeling is that neither 2 nor 3 is data that is of general interest and should not be part of the API.
Me and @BobBorges had a discussion on how to structure the corpus in the best way and also to clarify the API. We have three types of data, as I see it:
The questions is how to store this in the corpus in a good way, since that is also show how we think about the API.
My gut feeling is that neither 2 nor 3 is data that is of general interest and should not be part of the API.
The question is then where to put this data.
Potential solutions:
What do you think?