togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Apache License 2.0
4.57k stars 350 forks source link

I got an issue when I use fasttext doing arxiv cleaning. #65

Open tangtianyi1998 opened 1 year ago

tangtianyi1998 commented 1 year ago

image what is the version of the python and fasttext?

mauriceweber commented 1 year ago

Hi @tangtianyi1998

You get this error, when fasttext cannot find the model binary -- did you run the following steps?

mkdir -p models
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -P models