shaigue / pmi_masking

This repository contains code that takes a text corpus and creates a PMI masking vocabulary for it.
MIT License
1 stars 0 forks source link

Add support for RedPajama #33

Open shaigue opened 1 year ago

shaigue commented 1 year ago

https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T

Since I don't have direct access to large disk space, this is a little hard to deal with. On-hold...