princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.33k stars 505 forks source link

WARNING - datasets.builder - Using custom data configuration default-ea41478fda366be4 #200

Closed ChhXiitaa closed 1 year ago

ChhXiitaa commented 1 year ago

Hi, I've been waiting for 20 hours. I don't know what I'm dealing with.

gaotianyu1350 commented 1 year ago

Hi,

Can you provide more information? For example, you can do CTRL+C while the program is stuck there to see which line of code exactly causes the waiting.

ChhXiitaa commented 1 year ago

Hi,

09/12/2022 21:23:34 - INFO - main - PyTorch: setting up devices 09/12/2022 21:23:34 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 2 distributed training: False, 16-bits training: True 09/12/2022 21:23:34 - INFO - main - Training/evaluation parameters OurTrainingArguments(output_dir='result/my-sup-simcse-bert-base-uncased', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=<EvaluationStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=128, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_steps=0, logging_dir='runs\Sep12_21-23-34_DESKTOP-QGFR6Q6', logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=True, fp16_opt_level='O1', fp16_backend='auto', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=125, dataloader_num_workers=0, past_index=-1, run_name='result/my-sup-simcse-bert-base-uncased', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=True, metric_for_best_model='stsb_spearman', greater_is_better=True, ignore_data_skip=False, sharded_ddp=False, deepspeed=None, label_smoothing_factor=0.0, adafactor=False, eval_transfer=False) 09/12/2022 21:23:36 - WARNING - datasets.builder - Using custom data configuration default-a29cff7c999075b1

if extension == "csv": datasets = load_dataset(extension, data_files=data_files, cache_dir="./data/", delimiter="\t" if "tsv" in data_args.train_file else ",")

The above problem occurs at this point

pvicinan commented 1 year ago

Thank you for a well documented and easy to use repository Tianyu!

I resolved this issue by removing the cache_dir argument from the load_dataset function.

ChhXiitaa commented 1 year ago

thank you,your answer help me