rbroc / echo

A Scalable and Explainable Approach to Discriminating Between Human and Artificially Generated Text
https://cc.au.dk/en/clai/current-projects/a-scalable-and-explainable-approach-to-discriminating-between-human-and-artificially-generated-text
2 stars 1 forks source link

Classify pipeline #66

Closed MinaAlmasi closed 2 months ago

MinaAlmasi commented 2 months ago

What has been done

Further work on classify pipeline:

  1. Pipelines for TF-IDF, XGBOOST on all PC components, XGBOOST on the top n (3) PC components
  2. Metrics extraction pipeline has been updated to be able to compute perplexity and entropy manually (but not yet run, see #62)
  3. Summarising tables have been created with prelim precision, recall scores etc. (e.g., clf_results/clf_results/dailydialog_temp1/all_results.html)

Note that everything has been run on the datasets with temperature 1 but it has been setup in a way where we can easily run it on 1.5 as well.

Decisions to be made

Basically whether we need to make any changes to PCA (following what is described in #65) and which model to use for computing perplexity/entropy outside of textdescriptives #62