unitaryai / detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
https://www.unitary.ai/
Apache License 2.0
887 stars 115 forks source link

Error during training #84

Closed grecosalvatore closed 1 year ago

grecosalvatore commented 1 year ago

I tried to start the training for Toxic Comment Classification Challenge with the code provided in the documentation:

# combine test.csv and test_labels.csv
python preprocessing_utils.py --test_csv jigsaw_data/jigsaw-toxic-comment-classification-challenge/test.csv --update_test

python train.py --config configs/Toxic_comment_classification_BERT.json

However, it returns the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'jigsaw_data/jigsaw-toxic-comment-classification-challenge/val.csv'

I saw that only training and test datasets are present among the data. Should I use the test by changing the configuration file? ( I have downloaded the datasets from the following link: https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data?select=train.csv.zip )

Thanks in advance

laurahanu commented 1 year ago

Hello,

For the final model, we trained on the whole train set and validated/tested on the test set. However, for normal model training and experimentation, you should split the train set to create a new val set that is independent from the test set.

Hope this helps!