Closed arjunguha closed 5 months ago
@arjunguha What version is your transformers? I see this in the model card:
Model Summary This is the same model as SantaCoder but it can be loaded with transformers >=4.28.1 to use the GPTBigCode architecture. We refer the reader to the SantaCoder model page for full documentation about this model
main: Uses the gpt_bigcode model. Requires the bigcode fork of transformers. main_custom: Packaged with its modeling code. Requires transformers>=4.27. Alternatively, it can run on older versions by setting the configuration parameter activation_function = "gelu_pytorch_tanh".
I'm using transformers 4.36.2. So, the main branch and not the version that has its own modeling code.
SantaCoder
seems to work in the current release of nnsight.
from nnsight import LanguageModel
model = LanguageModel("bigcode/gpt_bigcode-santacoder", device_map='cuda:0')
with model.trace("# the following python function computes the sqrt"):
test = model.transformer.h[0].attn.output.save()
Closing this issue.
I'm happy to try to debug this. But, in case the error is obvious to an nnsight hacker, here is an error I'm getting.
This is the model:
https://huggingface.co/bigcode/gpt_bigcode-santacoder
This is my code that raises the error below. I am able to load Pythia as shown in the tutorial.
Error: