gururise / AlpacaDataCleaned

Alpaca dataset from Stanford, cleaned and curated
Apache License 2.0
1.5k stars 149 forks source link

Diffs as data #42

Closed KenAKAFrosty closed 1 year ago

KenAKAFrosty commented 1 year ago

What if we used the diffs from all this cleaning effort to train a model to do the cleaning?

gururise commented 1 year ago

Thats an interesting idea. No idea if it would work or not. Additionally, the diffs would have to be broken up into many different chunks to fit into the context length of the model.