huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.16k stars 217 forks source link

TypeError: '<' not supported between instances of 'str' and 'NoneType' #296

Closed vahuja4 closed 1 year ago

vahuja4 commented 1 year ago

I am following the text classification notebook using a csv dataset which is in the following format:

text,label
I want to close my account,accountClose
Close my credit card,accountClose
Mortgage payoff,accountClose
Loan payoff,accountClose
Loan pay off,accountClose
pay off,accountClose
lease payoff,accountClose
lease pay off,accountClose
account close,accountClose
close card account,accountClose
I want to open an account,accountOpenGeneral
I want to get a card,accountOpenGeneral
I want a loan,accountOpenGeneral
Refinance my car,accountOpenGeneral
Buy a car,accountOpenGeneral
Open checking,accountOpenGeneral
Open savings,accountOpenGeneral
Lease a vehicle,accountOpenGeneral
Link external bank account,accountTransferManage
verify external account,accountTransferManage
Add external account,accountTransferManage
Edit external account,accountTransferManage
Remove external account,accountTransferManage
Mortgage payment,billPaySchedulePayment
Setup Loan payment,billPaySchedulePayment
Setup auto loan payment,billPaySchedulePayment
Schedule bill payment,billPaySchedulePayment
Setup bill payment,billPaySchedulePayment
Setup automatic payment,billPaySchedulePayment
Setup auto pay,billPaySchedulePayment

And I am getting the following error: `Applying column mapping to training dataset

TypeError Traceback (most recent call last) Input In [9], in 1 # Train and evaluate ----> 2 trainer.train() 3 metrics = trainer.evaluate() 5 # save

File /opt/omniai/work/instance1/envs/hface/lib/python3.8/site-packages/setfit/trainer.py:348, in SetFitTrainer.train(self, num_epochs, batch_size, learning_rate, body_learning_rate, l2_weight, trial) 344 train_examples = sentence_pairs_generation_multilabel( 345 np.array(x_train), np.array(y_train), train_examples 346 ) 347 else: --> 348 train_examples = sentence_pairs_generation( 349 np.array(x_train), np.array(y_train), train_examples 350 ) 352 train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=batch_size) 353 train_loss = self.loss_class(self.model.model_body)

File /opt/omniai/work/instance1/envs/hface/lib/python3.8/site-packages/setfit/modeling.py:493, in sentence_pairs_generation(sentences, labels, pairs) 489 def sentence_pairs_generation(sentences, labels, pairs): 490 # Initialize two empty lists to hold the (sentence, sentence) pairs and 491 # labels to indicate if a pair is positive or negative --> 493 num_classes = np.unique(labels) 494 idx = [np.where(labels == i)[0] for i in num_classes] 496 for first_idx in range(len(sentences)):

File <__array_function__ internals>:180, in unique(*args, **kwargs)

File /opt/omniai/work/instance1/envs/hface/lib/python3.8/site-packages/numpy/lib/arraysetops.py:274, in unique(ar, return_index, return_inverse, return_counts, axis, equal_nan) 272 ar = np.asanyarray(ar) 273 if axis is None: --> 274 ret = _unique1d(ar, return_index, return_inverse, return_counts, 275 equal_nan=equal_nan) 276 return _unpack_tuple(ret) 278 # axis was specified and not None

File /opt/omniai/work/instance1/envs/hface/lib/python3.8/site-packages/numpy/lib/arraysetops.py:336, in _unique1d(ar, return_index, return_inverse, return_counts, equalnan) 334 aux = ar[perm] 335 else: --> 336 ar.sort() 337 aux = ar 338 mask = np.empty(aux.shape, dtype=np.bool)

TypeError: '<' not supported between instances of 'str' and 'NoneType'`

tomaarsen commented 1 year ago

Should you still have issues with this, I believe it is related to your labels being strings rather than integers. I think a scikit-learn LabelEncoder is a common solution.