Some minor restructuring/refactoring of repo before going into exam season in December. More to come in January 24. For now, focus is on experimenting with petals and quantized models. Note that preliminary but full datasets have been added to datasets_ai/ALL_DATA with the 7B beluga model on the stories data.
Highlights from restructuring:
Renamed folder "prompt-select" for all files on the analysis of prompts for proper prompt selection
Moved all files concerned with text generation to src/generate and all files on metric extraction to src/metrics
Slight updates to main README, added src/generate/README.md for instructions on how to run a custom text gen pipeline
Future plans:
Update text generation pipeline to handle top-k/top-p sampling to investigate whether we can get less repetitive results
Clean up/refactor in prompt_fns.py and pipeline_fns.py (make model selection more smooth)
Simplify data folders (no need for having folders out, datasets, datasets_ai, and out_ai)
DOCUMENTATION: Write an overview of files in main README, keep working on individual READMEs for folders that need them.
Some minor restructuring/refactoring of repo before going into exam season in December. More to come in January 24. For now, focus is on experimenting with petals and quantized models. Note that preliminary but full datasets have been added to datasets_ai/ALL_DATA with the 7B beluga model on the stories data.
Highlights from restructuring:
Future plans:
prompt_fns.py
andpipeline_fns.py
(make model selection more smooth)out
,datasets
,datasets_ai
, andout_ai
)