jondurbin / airoboros

Customizable implementation of the self-instruct paper.
Apache License 2.0
1.02k stars 71 forks source link

Primary dataset unreachable #5

Closed teknium1 closed 1 year ago

teknium1 commented 1 year ago

When I click the 3.5-turbo dataset link it does this:


<Error>
<script/>
<link type="text/css" rel="stylesheet" id="dark-mode-custom-link"/>
<link type="text/css" rel="stylesheet" id="dark-mode-general-link"/>
<style lang="en" type="text/css" id="dark-mode-custom-style"/>
<style lang="en" type="text/css" id="dark-mode-native-style"/>
<style lang="en" type="text/css" id="dark-mode-native-sheet"/>
<script async="false" type="text/javascript" src="chrome-extension://fnjhmkhhmkbjkkabndcnnogagogbneec/in-page.js"/>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Details>No such object: airoboros-dump/gpt-3.5-turbo-100k/instructions.jsonl</Details>
</Error> ``` 
jondurbin commented 1 year ago

So, I was cleaning up the bucket and accidentally deleted it 🤦

I updated the readme to only link to the valid huggingface datasources, including a new one that is all GPT-4 (plus a few thousand simple math prompts I was using, to test if it helps prevent the model from degrading its math abilities during fine tuning)

Feel free to use those for now if you wish; I'll see if I can recover the original.

The smaller GPT4 subset fine-tuned model (not publised yet) seems to perform better anyways.

teknium1 commented 1 year ago

So, I was cleaning up the bucket and accidentally deleted it 🤦

I updated the readme to only link to the valid huggingface datasources, including a new one that is all GPT-4 (plus a few thousand simple math prompts I was using, to test if it helps prevent the model from degrading its math abilities during fine tuning)

Feel free to use those for now if you wish; I'll see if I can recover the original.

The smaller GPT4 subset fine-tuned model (not publised yet) seems to perform better anyways.

Beautiful, appreciated!

jondurbin commented 1 year ago

https://storage.googleapis.com/airoboros-dump/instructions-gpt3.5-turbo-100k.jsonl