teknium1 / GPTeacher

A collection of modular datasets generated by GPT-4, General-Instruct - Roleplay-Instruct - Code-Instruct - and Toolformer
MIT License
1.62k stars 170 forks source link

Create roleplay-simple-deduped-roleplay-instruct.yaml #6

Closed d3287t328 closed 1 year ago

d3287t328 commented 1 year ago

basic conversion to yaml of the json version. 2711 fewer tokens per my token size test script (next commit in my personal repo).

teknium1 commented 1 year ago

Can you explain why it uses less tokens?

d3287t328 commented 1 year ago

Can you explain why it uses less tokens?

The yaml specification is meant to be human readable and contains less boilerplate. However just today I learned markdown is possible for prompt input and that can reduce token size another substantial token savings https://twitter.com/yupiop12/status/1654478098270658561