paperswithcode / galai

Model API for GALACTICA
Apache License 2.0
2.68k stars 276 forks source link

Multi-threaded mode? #38

Closed cvinker closed 1 year ago

cvinker commented 1 year ago
import torch, gc
from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("galactica-30b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 2020

model = OPTForCausalLM.from_pretrained("galactica-30b")

input_text = """# Scientific article.
title: Purpose of Humanity's continued existence alive.

# Introduction
"""
input_ids = tokenizer(input_text, return_tensors="pt", padding='max_length').input_ids

outputs = model.generate(input_ids,
                         max_new_tokens=1000,
                         do_sample=True,
                         temperature=0.7,
                         top_k=25,
                         top_p=0.9,
                         no_repeat_ngram_size=10,
                         early_stopping=True)

print(tokenizer.decode(outputs[0]).lstrip('<pad>'))

gc.collect()
torch.empty_cache()

When I run this, I can see it loads the model into ram; it seems only to be using one thread. The output is: a wall of various 'decoder.layers.xx.bias' and "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference."

cvinker commented 1 year ago

Ok, I was able to get it to work properly with the 6.7b model I don't think I need the: torch.empty_cache() Also, it does seem to be using multi-threading.

cvinker commented 1 year ago

What I got out of it:

image