odissei-lifecourse / life-sequencing-dutch

MIT License
0 stars 0 forks source link

Unifying the data architecture #90

Open f-hafner opened 1 month ago

f-hafner commented 1 month ago

We need to have a single ground truth of data. For instance, (intermediate) output that is produced by Lucas should be available to query for the evaluation. This will avoid bugs that we're currently trying to find, and will make it easier to add more experiments to the evaluation.

What I think we need is a relational database with several tables.

I think we also need tables for the following, but I'm not sure what's the best structure for the table

later, we can extend this if necessary with more tables. we could also consider adding variables such as the json sequences and the embeddings. see #71

I suggest Lucas/Ana prepare the data in a csv file. I then upload the file to the OSSC and put it into a database. I think most of this is already done in https://github.com/odissei-lifecourse/life-sequencing-dutch/tree/main/pop2vec/evaluation/domain?

Am I missing anything here? is something not feasible?

f-hafner commented 1 month ago

any comments, @dakota0064 @tanzir5

f-hafner commented 1 month ago

figure out whether we use integers or strings for person IDs