rbroc / echo

A Scalable and Explainable Approach to Discriminating Between Human and Artificially Generated Text
https://cc.au.dk/en/clai/current-projects/a-scalable-and-explainable-approach-to-discriminating-between-human-and-artificially-generated-text
2 stars 1 forks source link

PROMPTING: more complete prompt pipeline, implementing models, some data cleaning #20

Closed MinaAlmasi closed 1 year ago

MinaAlmasi commented 1 year ago

Progress Update

PROMPTING

Prompting is still in progress, but a more complete pipeline has been developed with functions in modules/prompt_fns.py and modules/pipeline_fns.py. These functions are applied in the script src/gen_pipeline.py.

Models llama2-7b, falcon-7b, T5 and StableBeluga-7b have been tested.

CLEANING

Stories has been cleaned further as tokens (e.g., [ wp ]) interfered with some model completions. Dailydialog likely needs similar cleaning in the future (removing [EOT] tokens).

FURTHER DEVELOPMENT

  1. Task prompting (e.g., "summarise this") may still need tweaks
  2. Further implementation is needed to get the chat versions of llama to run (wrapping task prompts in a chat interface)
  3. The bigger models need to be tested with a GPU.