huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.24k stars 222 forks source link

'datasets.arrow_dataset.Dataset' do not support numpy.int64 as index? #551

Open qianyue76 opened 2 months ago

qianyue76 commented 2 months ago

When I predict the model `from setfit import SetFitModel from datasets import load_dataset model = SetFitModel.from_pretrained(model_name)

test_dataset = load_dataset("csv", data_files="dataset.csv",split="train") predictions = model.predict(test_dataset)`

I meet the error image

I print the type of length_sorted_idx and sentences in the top 2 lines. And my relevant package version as follows: setfit 1.1.0.dev0 sentence-transformers 3.1.0 datasets 3.0.0 torch 2.0.1 My question is on the title.

antrec commented 2 months ago

+1 (Should this not be an issue on the datasets package) ?

EDIT: Has it not been fixed here https://github.com/huggingface/datasets/pull/6817/commits/0ec973d0f3a86db827e2e7e9952311f774d449bb ? Yet you are using datasets 3.0.0 which includes this PR 🤔 ...