r-three / common-pile

Repo to hold code and track issues for the collection of permissively licensed data
MIT License
22 stars 6 forks source link

Stackv2 #72

Closed Muennighoff closed 4 months ago

Muennighoff commented 4 months ago

Still waiting for the dataset to be released (hopefully soon), but does this look more or less reasonable @blester125 ? Is there anything else I can do while the DS is not yet released?

Muennighoff commented 4 months ago

Should we preliminarily merge this @blester125 ? Will create a new PR then once the data is released to make the necessary changes.