huggingface / optimum-nvidia

Apache License 2.0
844 stars 83 forks source link

Providing input_embeddings for generation instead of IDs #129

Open verityw opened 2 months ago

verityw commented 2 months ago

Is there a way to run Llama2 inference by providing the prompt as inputs_embeds (as allowed by the standard Llama2 forward function)? Likewise, is there an easy way of accessing the model's embeddings module, such that we can manually map input id integers to embeddings?