megagonlabs / ditto

Code for the paper "Deep Entity Matching with Pre-trained Language Models"
Apache License 2.0
256 stars 88 forks source link

ValueError: not enough values to unpack (expected 2, got 1) - Textual/Company #23

Open Ribo-Py opened 2 years ago

Ribo-Py commented 2 years ago

!CUDA_VISIBLE_DEVICES=0 python train_ditto.py \ --task Textual/Company \ --batch_size 32 \ --max_len 128 \ --lr 3e-5 \ --n_epochs 20 \ --finetuning \ --lm roberta \ --fp16 \ --da drop_col

step: 0, loss: 0.609293520450592 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Traceback (most recent call last): File "train_ditto.py", line 92, in <module> run_tag, hp) File "/home/ec2-user/SageMaker/vendor_matching/ditto/ditto_light/ditto.py", line 201, in train train_step(train_iter, model, optimizer, scheduler, hp) File "/home/ec2-user/SageMaker/vendor_matching/ditto/ditto_light/ditto.py", line 123, in train_step for i, batch in enumerate(train_iter): File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ec2-user/SageMaker/vendor_matching/ditto/ditto_light/dataset.py", line 80, in __getitem__ left, right = combined.split(' [SEP] ') ValueError: not enough values to unpack (expected 2, got 1)

utsgr commented 2 years ago

Were you able to solve this value error?

Ribo-Py commented 2 years ago

No.

On Thursday, December 2, 2021, utsgr @.***> wrote:

Were you able to solve this value error?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/megagonlabs/ditto/issues/23#issuecomment-984989644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQAGJV3CQCJMTCFU7RJN223UO7LUHANCNFSM5HOUZQIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pauloh48 commented 2 years ago

Were you able to solve this value error?

progsi commented 1 year ago

in my case there were two main problems: 1. In the code they try to split at [SEP] where as the default in the datasets provided to split between left and right is \t. 2. The variable combined gets split as well. By simply removing this step, it seems to work.

oskar-ong commented 3 months ago

I am encountering the same issue with the "wdc_all_small" task.