RuntimeError with Mixed Devices on GPU and CPU when Using Nomic Embed with SentEval

Hello,

I encountered an issue while testing the performance of Nomic Embed within SentEval, following the demo program provided by Hugging Face (https://huggingface.co/nomic-ai/nomic-embed-text-v1). Specifically, when executing model_output = model(**encoded_input), I consistently receive an error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! Interestingly, the model has already been placed on the GPU, and encoded_input is a list. Could you please help me understand what might be causing this error?

Here is a snippet of the code I'm using:

model_path = '../../models/nomic-embed-text-v1'

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
model.eval()

def batcher(params, batch):
    # Handle rare token encoding issues in the dataset
    if len(batch) >= 1 and len(batch[0]) >= 1 and isinstance(batch[0][0], bytes):
        batch = [[word.decode('utf-8') for word in s] for s in batch]

    # batch is divided by token. we need to form sentences
    sentences = [' '.join(s) for s in batch]
    prefixed_sentences = ['search_query: ' + sentence for sentence in sentences]
    encoded_input = tokenizer(prefixed_sentences, padding=True, truncation=True, return_tensors='pt')

    with torch.no_grad():
        model_output = model(**encoded_input)

    embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
    embeddings = F.normalize(embeddings, p=2, dim=1)
    return embeddings.cpu()

nomic-ai / contrastors

RuntimeError with Mixed Devices on GPU and CPU when Using Nomic Embed with SentEval #14