odissei-lifecourse / life-sequencing-dutch

MIT License
0 stars 0 forks source link

Get Richer data #16

Open tanzir5 opened 2 months ago

tanzir5 commented 2 months ago

We need to get better data for the language model. This spreadsheet explains what we are utilizing currently: https://docs.google.com/spreadsheets/d/1JzdWpDeB5gWeXaw99akgy8virZq7ERiS6WcZKXKNf0k/edit#gid=0

We need to figure out which files from here are not too expensive (Tom question) and maybe good for the LM to train on. https://www.cbs.nl/en-gb/our-services/customised-services-microdata/microdata-conducting-your-own-research/microdata-catalogue

tanzir5 commented 2 months ago

@f-hafner This is a more Lucas task, but you can take a dig at it too if you find it interesting.

f-hafner commented 2 months ago

Yeah; I think it would be great if we can keep all input data after Lucas' processing in one place. It seems to me they're scattered around at the moment.

tanzir5 commented 2 months ago

Discussion with Lucas yielded the following options:

  1. geographic mobility
  2. housing
  3. marriage data
  4. finegrained educational data