PROMPTING: More stable prompting, chat versions implemented, one-shot testing,

PROMPTING

Further development on creating a more stable, modular prompting pipeline. Models AND their chat-versions have been implemented (except Alpaca). We are getting quite good results from Beluga, Llama2 (esp. chat) and Falcon.

Follow-up on #20

Task prompting is still in progress. Also currently testing some one-shot performance with T5.
[EOT] tokens have been removed from dailydialog
Llama (7b) and its chat-version have been implemented !

Further Development

[1] GPU performance: StableBeluga has been tested on a small(er) GPU (one A40 i.e., uc-a40-1-h (x1)) and performance is slower than expected when generating longer sequences (for stories for instance).
- Might just require a bigger machine (e.g., two A40's which we have access to). Will be tested shortly!
- Worth investigating a possible bottleneck in how the generations are created. Currently, generations are not batched, but rather created sequentially. Although batching is not recommended by HF as a default, it should be investigated as it may result in a speedup
- We may need to consider how long the texts to generate should be. Currently based on the 25% and 75 % quantiles of lengths of completions within each dataset. Models struggle with stories where generations are supposed to be long.
[2] Should we implement alpaca ?

rbroc / echo

PROMPTING: More stable prompting, chat versions implemented, one-shot testing, #21

PROMPTING

Follow-up on #20

Further Development