Closed greenpau closed 1 year ago
@hongjin-su , any advice here?
The types are:
model
: transformers.models.t5.modeling_t5.T5Model
tokenizer
: transformers.models.t5.tokenization_t5_fast.T5TokenizerFast
Tried using tokenizer/model.
encoding = tokenizer(pair["instruction"], pair["text"], return_tensors="pt")
input_ids = encoding["input_ids"]
attention_mask = encoding["attention_mask"]
# output is transformers.modeling_outputs.BaseModelOutputWithPastAndCrossAttentions
output = model.encoder(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)
# The shape of the last hidden state is torch.Size([1, 16, 1024])
output_last_hidden_state = output.last_hidden_state
The last_hidden_state
is [1, 16, 1024]
. How do I get 768-dimensional encoding out of it?
Hi, Thanks a lot for your interests in the INSTRCUTOR model!
You may try to load the model via:
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')
Please re-open the issue if you have any questions or comments!
Hi,
I am following this guide to deploy instructor-embedding on Amazon SageMaker.
https://www.philschmid.de/custom-inference-huggingface-sagemaker
I've created
model.tar.gz
that contains cached version of the model.In the
inference.py
I load the model from themodel
directory inmodel.tar.gz
.The model type for this one comes up as
T5Model
and it does not haveencode
method.Which method and syntax do I use to perform the embedding?