Closed SjoerdBraaksma closed 1 year ago
If you need to extract embeddings and fine-tuning is not an option, you can extract hidden states from BERTje. I think the simplest way is this: https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/pipelines#transformers.FeatureExtractionPipeline
from transformers import pipeline
extractor = pipeline(model="GroNLP/bert-base-dutch-cased", task="feature-extraction", model_kwargs={"num_hidden_layers": 10})
result = extractor("Dit is een test.", return_tensors=True)
result.shape # This is a tensor of shape [1, sequence_lenth, hidden_dimension] representing the input string.
torch.Size([1, 8, 768])`
Choose num_hidden_layers between 1 and 12. defaults to 12, but this is suboptimal.
I hope this helps. This is mostly a generic language modeling question. I refer you to the Hugging Face Forums if you need more help.
Hoi Wietse!
I am relatively new to using BERT models, and I was wondering if it is possible to access the word embeddings directly, so I can make them usable in other frameworks. In my specific use case, use it as the embedding model in Top2Vec.
Is this possible and if yes, how can I do this?
thanks in advance!