helboukkouri / character-bert

Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"
Apache License 2.0
195 stars 47 forks source link

Printing character level vectors #16

Closed ozturkoktay closed 12 months ago

ozturkoktay commented 3 years ago

Hi,

You're printing words and their embeddings using:

for token, embedding in zip(x, embeddings_for_x):
    print(token, embedding)

How can I see each letter's vector?

helboukkouri commented 3 years ago

Hi @ozturkoktay, CharacterBERT is actually a word-level model. So, although it looks at each word's characters, it generates word-level vectors. If you really like to look at character vectors the only way is to extract the character embedding layer. But note that the elements of this matrix are not really characters but utf-8 bytes. 😊

ozturkoktay commented 3 years ago

Hi @helboukkouri, How can I extract the character embedding layer? Can you please share a code example?