from nnsight import LanguageModel
# We'll never actually load the parameters so no need to specify a device_map.
model = LanguageModel("meta-llama/Llama-2-70b-hf")
# All we need to specify using NDIF vs executing locally is remote=True.
with model.trace("The Eiffel Tower is in the city of", remote=True) as runner:
hidden_states = model.model.layers[-1].output.save()
output = model.output.save()
print(hidden_states)
print(output["logits"])
This PR broke LanguageModel by adding an mlp bias to the config: https://github.com/huggingface/transformers/pull/30031/files
Still works on transformers 4.40.0.