rbroc / echo

A Scalable and Explainable Approach to Discriminating Between Human and Artificially Generated Text
https://cc.au.dk/en/clai/current-projects/a-scalable-and-explainable-approach-to-discriminating-between-human-and-artificially-generated-text
2 stars 1 forks source link

PROMPT SELECTION: pca, euclidean distances #30

Closed MinaAlmasi closed 10 months ago

MinaAlmasi commented 10 months ago

Scripts have been added to aid our prompt selection in which we have done the following:

  1. Run PCA on low level features (["doc_length", "n_tokens", "n_characters", "n_sentences"]) to get new components
  2. Compute distances between human generations and model generations (human-beluga, human-llama2_chat for each human completion).
  3. Plot distances in static and interactive plots
  4. Compute the median of the distances to identify the best performing prompts

Results are found in README.md in the results folder.