princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.33k stars 505 forks source link

Issue w/ batch shape #191

Closed sundavid2002 closed 1 year ago

sundavid2002 commented 2 years ago

I am trying to run SimCSE evaluation on a virtual environment w/ python 3.8.6 & the following packages:

transformers==4.2.1 scipy==1.5.4 datasets==1.2.1 pandas==1.1.5 scikit-learn==0.24.0 prettytable==2.1.0 gradio torch setuptools==49.3.0

I tried running evaluation.py but encountered the following error message:

/Users/davidsun/Downloads/SimCSE/SentEval/senteval/sts.py:42: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray sent1 = np.array([s.split() for s in sent1])[not_empty_idx] /Users/davidsun/Downloads/SimCSE/SentEval/senteval/sts.py:43: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray sent2 = np.array([s.split() for s in sent2])[not_empty_idx] Traceback (most recent call last): File "/Users/davidsun/Downloads/SimCSEvenv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 314, in getattr return self.data[item] KeyError: 'size'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/davidsun/Downloads/SimCSE/evaluation.py", line 205, in main() File "/Users/davidsun/Downloads/SimCSE/evaluation.py", line 144, in main result = se.eval(task) File "/Users/davidsun/Downloads/SimCSE/SentEval/senteval/engine.py", line 127, in eval self.results = self.evaluation.run(self.params, self.batcher) File "/Users/davidsun/Downloads/SimCSE/SentEval/senteval/sts.py", line 72, in run enc1 = batcher(params, batch1) File "/Users/davidsun/Downloads/SimCSE/evaluation.py", line 114, in batcher outputs = model(batch, output_hidden_states=True, return_dict=True) File "/Users/davidsun/Downloads/SimCSEvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/Users/davidsun/Downloads/SimCSEvenv/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 751, in forward input_shape = input_ids.size() File "/Users/davidsun/Downloads/SimCSEvenv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 316, in getattr raise AttributeError AttributeError

Working backward through the error, I reasoned that it's because of something wrong with the shape of the batches of data b/c the error is returned from the inner operations of the batcher function. Is this reasoning right? Or is the issue w/ the Python version I'm using? What should I do about this?

sundavid2002 commented 2 years ago

Hello, I posted about an issue about how the shape of the dataset's data didn't match up with that expected of the batcher function when running evaluation.py It's been 6 days, but no one has responded yet.

Can someone please provide some guidance on how to respond to this issue, and do it as soon as possible?

sundavid2002 commented 2 years ago

Hello, I posted about an issue about how the shape of the dataset's data didn't match up with that expected of the batcher function when running evaluation.py It's been 11 days, but no one has responded yet.

Can someone please provide some guidance on how to respond to this issue, and do it as soon as possible?

gaotianyu1350 commented 1 year ago

Hi sorry for the late response, I tested this and it worked fine for me. Would you mind providing the exact command you executed? Thanks!