speed up processing of tabular files

this concerns the scripts for extracting summary stats, and possibly the conversion of tables to datasets for LLM training.

for the summary stats, we're currently using the python engine to read in the files. this is because not all csv files use the same column separator. using the c engine would require us knowing and specifying the separator for each file.

moreover, we should consider tools to exploit multiple cores when handling dataframes. dask.dataframe may be a good option, but we should explore a bit more.

odissei-lifecourse / life-sequencing-dutch

speed up processing of tabular files #25