Closed ericwtodd closed 8 months ago
You should be able to do something similar by calling .local_model
.
from nnsight import LanguageModel
import torch as t
model = LanguageModel("gpt2", device_map="auto", dispatch=True)
tokenizer = model.tokenizer
test = t.tensor(tokenizer.encode("test"))
logits = model.local_model(test)
I'll look into implementing a version that automatically tokenizes inputs and works well with a remote framework.
@ericwtodd In 0.2, you can enter a tracing context with input and set trace=False
for no with
context block and just get the output directly like
from nnsight import LanguageModel
model = LanguageModel("gpt2", device_map="auto", dispatch=True)
output = model.trace('Hello', trace=False)
I wonder if others would be interested in a feature that allows for simple inference without a context manager.
Currently, the simplest way to run inference on a language model using nnsight is to use an
invoke
context. For example:An potential interface for something like this might look like
output = model.simple_invoke(prompt)
oroutput = model.invoke(prompt, trace=False)
.The use case I imagine for this would be if you just want to run inference to quickly see what a model would output on a particular prompt, without needing to access internals.
Thanks for the consideration!