Closed BobBorges closed 11 months ago
This looks good. But we should not keep this in the metadata since it should be used for testing. Ie put it in quality assesment folder instead.
long term we should maybe more this type of unit test data to the test folder instead.
we should not keep this in the metadata
fair enough
long term
Let's do it now if it's how it should be. How about test/input/
for this type of file? @MansMeg
The long-term solution is something we can discuss tomorrow. The main things to consider: 1) It should not be part of the API of the corpus (i.e. not part of the corpus folder). i.e. it is not intended for ordinary users. 2) It should be intuitive (i.e. I should be able to find it simply if I know what I'm looking for). I.e. not generic folder names such as "input". It should probably be in a folder structure like "data_integrity_tests_data" or similar (but a better name =) ). 3) It should be part of the corpus general repository to simplify continuous integration. At least for now.
The only thing I think is important for now is not to put it in the corpus folder (ie 1).
@ninpnin , any thoughts?
I think it is better if you add the inferred number in the file. I think you are right, it is correct to infer as you do. But it might be better to have a full dataset for this? Otherwise, we need to do this inference every time someone uses the file. Or? What do you think?
coding with a complete file will definitely be easier -- I just didn't want to add information that we can't tie to a source.
Ok. I have no strong opinion here. But when you have the source as a column, you could just set "inferred" there to indicate that it is not from a source. Do as you like.
LGTM
I'll use this file in several planned unit tests.
Columns:
There isn't a source for every year -- years with no source currently have no number listed. I have been inferring missing numbers; if no authoritative source mentioned a change,missing numbers are considered the same as the previous year.