huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.71k stars 26.71k forks source link

What does the output of feature-extraction pipeline represent? #4613

Closed orenpapers closed 4 years ago

orenpapers commented 4 years ago

I am using the feature-extraction pipeline:

nlp_fe = pipeline('feature-extraction')
nlp_fe('there is a book on the desk')

As an output I get a list with one element - that is a list with 9 elements - that is a list of 768 features (floats). What is the output represent? What is every element of the lists, and what is the meaning of the 768 float values? Thanks

Abhishek-Rnjn commented 4 years ago

They are embeddings generated from the model. (Bert -Base Model I guess. cause it has a hidden representation of 768 dim). You get 9 elements:- one contextual embedding for each word in your sequence. These values of embeddings represent some hidden features that are not easy to interpret.

orenpapers commented 4 years ago

So the pipeline will just return the last layer encoding of Bert? So what is the differance with a code like

input_ids = torch.tensor(bert_tokenizer.encode("Hello, my dog is cute")).unsqueeze(0)  
outputs = bert_model(input_ids)
hidden_states = outputs[-1][1:]  # The last hidden-state is the first element of the output tuple
layer_hidden_state = hidden_states[n_layer]
return layer_hidden_state

Also, does BERT encoding have similar traits as word2vec? e.g. similar word will be closer, France - Paris = England - London , etc?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

merleyc commented 3 years ago

So the pipeline will just return the last layer encoding of Bert? So what is the differance with a code like

input_ids = torch.tensor(bert_tokenizer.encode("Hello, my dog is cute")).unsqueeze(0)  
outputs = bert_model(input_ids)
hidden_states = outputs[-1][1:]  # The last hidden-state is the first element of the output tuple
layer_hidden_state = hidden_states[n_layer]
return layer_hidden_state

Also, does BERT encoding have similar traits as word2vec? e.g. similar word will be closer, France - Paris = England - London , etc?

Hi @orko19, Did you understand the difference from 'hidden_states' vs. 'feature-extraction pipeline'? I'd like to understand it as well Thanks!

orenpapers commented 3 years ago

@merleyc I do not! Please share if you do :)

allmwh commented 3 years ago

The outputs between "last_hidden_state" and "feature-extraction pipeline" are same, you can try by yourself

"feature-extraction pipeline" just helps us do some jobs from tokenize words to embedding