Pipelines for TF-IDF, XGBOOST on all PC components, XGBOOST on the top n (3) PC components
Metrics extraction pipeline has been updated to be able to compute perplexity and entropy manually (but not yet run, see #62)
Summarising tables have been created with prelim precision, recall scores etc. (e.g., clf_results/clf_results/dailydialog_temp1/all_results.html)
Note that everything has been run on the datasets with temperature 1 but it has been setup in a way where we can easily run it on 1.5 as well.
Decisions to be made
Basically whether we need to make any changes to PCA (following what is described in #65) and which model to use for computing perplexity/entropy outside of textdescriptives #62
What has been done
Further work on classify pipeline:
clf_results/clf_results/dailydialog_temp1/all_results.html
)Note that everything has been run on the datasets with
temperature 1
but it has been setup in a way where we can easily run it on 1.5 as well.Decisions to be made
Basically whether we need to make any changes to PCA (following what is described in #65) and which model to use for computing perplexity/entropy outside of textdescriptives #62