PROMPT SELECTION: pca, euclidean distances - Githubissues

rbroc / echo

A Scalable and Explainable Approach to Discriminating Between Human and Artificially Generated Text

https://cc.au.dk/en/clai/current-projects/a-scalable-and-explainable-approach-to-discriminating-between-human-and-artificially-generated-text

2 stars 1 forks source link

PROMPT SELECTION: pca, euclidean distances #30

Closed MinaAlmasi closed 10 months ago

MinaAlmasi commented 10 months ago

Scripts have been added to aid our prompt selection in which we have done the following:

Run PCA on low level features (["doc_length", "n_tokens", "n_characters", "n_sentences"]) to get new components
Compute distances between human generations and model generations (human-beluga, human-llama2_chat for each human completion).
Plot distances in static and interactive plots
Compute the median of the distances to identify the best performing prompts

Results are found in README.md in the results folder.