Further development on creating a more stable, modular prompting pipeline. Models AND their chat-versions have been implemented (except Alpaca). We are getting quite good results from Beluga, Llama2 (esp. chat) and Falcon.
Follow-up on #20
Task prompting is still in progress. Also currently testing some one-shot performance with T5.
[EOT] tokens have been removed from dailydialog
Llama (7b) and its chat-version have been implemented !
Further Development
[1] GPU performance: StableBeluga has been tested on a small(er) GPU (one A40 i.e., uc-a40-1-h (x1)) and performance is slower than expected when generating longer sequences (for stories for instance).
Might just require a bigger machine (e.g., two A40's which we have access to). Will be tested shortly!
Worth investigating a possible bottleneck in how the generations are created. Currently, generations are not batched, but rather created sequentially. Although batching is not recommended by HF as a default, it should be investigated as it may result in a speedup
We may need to consider how long the texts to generate should be. Currently based on the 25% and 75 % quantiles of lengths of completions within each dataset. Models struggle with stories where generations are supposed to be long.
PROMPTING
Further development on creating a more stable, modular prompting pipeline. Models AND their chat-versions have been implemented (except
Alpaca
). We are getting quite good results from Beluga, Llama2 (esp. chat) and Falcon.Follow-up on #20
dailydialog
Further Development
one A40 i.e., uc-a40-1-h (x1)
) and performance is slower than expected when generating longer sequences (forstories
for instance).stories
where generations are supposed to be long.