rbroc / echo

A Scalable and Explainable Approach to Discriminating Between Human and Artificially Generated Text
https://cc.au.dk/en/clai/current-projects/a-scalable-and-explainable-approach-to-discriminating-between-human-and-artificially-generated-text
2 stars 1 forks source link

Final Prompts + Generation of Data #48

Closed MinaAlmasi closed 6 months ago

MinaAlmasi commented 6 months ago

Updates

  1. Final prompts has been chosen as mentioned in #47
  2. Data is almost ready (some issues with stories breaking when temp is 2, I have to re-generate too many that are below min-tokens, resulting in a CUDA error)
  3. Loading generated data has been streamlined (see utils/process_generations) so that we can easily load data based on its temp/prompt number (or decide to load it all)

Some small considerations

Will not be spending much time on this (as we are otherwise very close to having all data fully generated) but will briefly look into:

  1. Look into updating vllm -> performance updates + new models + seed per request. May actually fix problems that sometimes occurs with CUDA
  2. Whether it would make sense to generate with llama7b also

Some future, not-urgent tasks

  1. Consider whether code in misc needs to be placed somewhere else (do we need that code?)
  2. Add README to src with overview of folders