kensho-technologies / bubs

Keras Implementation of Flair's Contextualized Embeddings
Apache License 2.0
26 stars 9 forks source link

char to int map for news, not news-fast #23

Closed pinesnow72 closed 3 years ago

pinesnow72 commented 3 years ago

This repository gives a way to use pytorch-based pretrained flair embedding weights by conversion. By the way, flair embeddings are based on char-level, which means that it has a mapping of characters into integers. This repository also gives (275,100) mapping table (in char_to_int.py) and this is well fit to the news-fast weights. However, I tried to use news weights version, not news-fast, It gives an error on mapping of char to int, because the news weights require (300,100) mapping table. I don't well understand the difference between news and news-fast. Where can I get (300,100) char-to-int mapping table? I could not find any char-to-int mapping table from the Flair repository.

ydovzhenko commented 3 years ago

Here is what I did when writing this, maybe it can point you in the right direction:

from flair.embeddings import FlairEmbeddings flair_forward_embeddings = FlairEmbeddings('news-forward') char_to_int = flair_forward_embeddings.lm.dictionary.item2idx

Flair may have changed things around since then - let me know if this doesn't help, and I'll investigate further.

pinesnow72 commented 3 years ago

Here is what I did when writing this, maybe it can point you in the right direction:

from flair.embeddings import FlairEmbeddings flair_forward_embeddings = FlairEmbeddings('news-forward') char_to_int = flair_forward_embeddings.lm.dictionary.item2idx

Flair may have changed things around since then - let me know if this doesn't help, and I'll investigate further.

Great!!!, it works well. Many thanks!