Fasttext with NLPAug Attribure Error 'Word2VecKeyedVectors' object has no attribute 'index_to_key'

ceaysenur commented 2 years ago

Hello,

I am trying to use nlpaug to a dataset and I used with BERT/distilBERT perfectly, it is a great way to augment data. However, when I try to use it with fasttext like this:

aug = naw.WordEmbsAug( model_type='fasttext', model_path=(the_path+'cc.tr.300.vec.gz'), action="substitute") augmented_text = aug.augment(text)

I get the error:

AttributeError Traceback (most recent call last) in () 2 aug = naw.WordEmbsAug( 3 model_type='fasttext', model_path=("/content/drive/MyDrive/"+'cc.tr.300.vec.gz'), ----> 4 action="substitute") 5 augmented_text = aug.augment(text)

4 frames /usr/local/lib/python3.7/dist-packages/nlpaug/model/word_embs/word_embeddings.py in _read(self) 14 15 def _read(self): ---> 16 self.words = [self.model.index_to_key[i] for i in range(len(self.model.index_to_key))] 17 self.emb_size = self.model[self.model.key_to_index[self.model.index_to_key[0]]] 18 self.vocab_size = len(self.words)

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'index_to_key'

I would like to know what happens here.. I use Google Colab

makcedward commented 2 years ago

It seems like your file is a compressed file. Try to uncompress it first (expecting the file extension is vec).

After that try to see whether you can execute the following script successfully. WordEmbsAug use gensim to load fasttext pre-trained model. So if you can load your file via gensim script, then you should able to initial naw.WordEmbsAug()

from gensim.models import KeyedVectors
KeyedVectors.load_word2vec_format(file_path)

ceaysenur commented 2 years ago

Thank you for the reply. Actually i already used this part before:

from gensim.models import KeyedVectors
KeyedVectors.load_word2vec_format(file_path)

Now I tried it after uncompressing like this


from gensim.models import KeyedVectors
KeyedVectors.load_word2vec_format('/content/drive/MyDrive/cc.tr.300.vec')

text="Bu cümleye benzer cümleler üretilebilir mi?"

aug = naw.WordEmbsAug(
model_type='fasttext', model_path=('/content/drive/MyDrive/cc.tr.300.vec'),
action="substitute")
augmented_text = aug.augment(text)

I still get the following error... Should I had add another line to create a connection between the lines ?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-8-a94c7ffe3d91>](https://localhost:8080/#) in <module>()
      7 aug = naw.WordEmbsAug(
      8 model_type='fasttext', model_path=('/content/drive/MyDrive/cc.tr.300.vec'),
----> 9 action="substitute")
     10 augmented_text = aug.augment(text)

4 frames
[/usr/local/lib/python3.7/dist-packages/nlpaug/model/word_embs/word_embeddings.py](https://localhost:8080/#) in _read(self)
     14 
     15     def _read(self):
---> 16         self.words = [self.model.index_to_key[i] for i in range(len(self.model.index_to_key))]
     17         self.emb_size = self.model[self.model.key_to_index[self.model.index_to_key[0]]]
     18         self.vocab_size = len(self.words)

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'index_to_key'

makcedward / nlpaug

Fasttext with NLPAug Attribure Error 'Word2VecKeyedVectors' object has no attribute 'index_to_key' #279