princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.36k stars 507 forks source link

How can I use the train.py? #160

Closed AlphaDoraem closed 2 years ago

AlphaDoraem commented 2 years ago

I want to train my own datasets(csv). So I run the code like this:

python train.py --model_name_or_path=models\unsup-simcse-bert-base-uncased --train_file=datasets\csv\train.csv --output_dir=result\my-unsup-simcse-bert-base-uncased --num_train_epochs=1 --per_device_train_batch_size=64 --learning_rate=3e-5 --max_seq_length=32 --evaluation_strategy=steps --metric_for_best_model=stsb_spearman --load_best_model_at_end --eval_steps=125 --pooler_type=cls --mlp_only_train --overwrite_output_dir --temp=0.05 --do_train --do_eval --fp16

but I got a error:

ConnectionError: Couldn‘t reach https://raw.githubusercontent.com/huggingface/datasets/1.2.1/datasets/csv/csv.py

So I download the csv.py and put it in my datasets(datasets\csv\csv.py)

and run the code again. But I got no response but just one line of output:

"Using custom data configuration default".

What's the problem? How can I use the train.py? Please help me.

gaotianyu1350 commented 2 years ago

Hi,

It seems to be a problem with the Huggingface dataset package. I'm not sure what's going on there and it might be more helpful referring to their issue board/manuals.

AlphaDoraem commented 2 years ago

Hi,

It seems to be a problem with the Huggingface dataset package. I'm not sure what's going on there and it might be more helpful referring to their issue board/manuals.

Thank you!