makcedward / nlpaug

Data augmentation for NLP
https://makcedward.github.io/
MIT License
4.44k stars 463 forks source link

Fasttext with NLPAug Attribure Error 'Word2VecKeyedVectors' object has no attribute 'index_to_key' #279

Open ceaysenur opened 2 years ago

ceaysenur commented 2 years ago

Hello,

I am trying to use nlpaug to a dataset and I used with BERT/distilBERT perfectly, it is a great way to augment data. However, when I try to use it with fasttext like this:

aug = naw.WordEmbsAug( model_type='fasttext', model_path=(the_path+'cc.tr.300.vec.gz'), action="substitute") augmented_text = aug.augment(text)

I get the error:

AttributeError Traceback (most recent call last) in () 2 aug = naw.WordEmbsAug( 3 model_type='fasttext', model_path=("/content/drive/MyDrive/"+'cc.tr.300.vec.gz'), ----> 4 action="substitute") 5 augmented_text = aug.augment(text)

4 frames /usr/local/lib/python3.7/dist-packages/nlpaug/model/word_embs/word_embeddings.py in _read(self) 14 15 def _read(self): ---> 16 self.words = [self.model.index_to_key[i] for i in range(len(self.model.index_to_key))] 17 self.emb_size = self.model[self.model.key_to_index[self.model.index_to_key[0]]] 18 self.vocab_size = len(self.words)

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'index_to_key'

I would like to know what happens here.. I use Google Colab

makcedward commented 2 years ago

It seems like your file is a compressed file. Try to uncompress it first (expecting the file extension is vec).

After that try to see whether you can execute the following script successfully. WordEmbsAug use gensim to load fasttext pre-trained model. So if you can load your file via gensim script, then you should able to initial naw.WordEmbsAug()

from gensim.models import KeyedVectors
KeyedVectors.load_word2vec_format(file_path)
ceaysenur commented 2 years ago

Thank you for the reply. Actually i already used this part before:

from gensim.models import KeyedVectors
KeyedVectors.load_word2vec_format(file_path)

Now I tried it after uncompressing like this


from gensim.models import KeyedVectors
KeyedVectors.load_word2vec_format('/content/drive/MyDrive/cc.tr.300.vec')

text="Bu cümleye benzer cümleler üretilebilir mi?"

aug = naw.WordEmbsAug(
model_type='fasttext', model_path=('/content/drive/MyDrive/cc.tr.300.vec'),
action="substitute")
augmented_text = aug.augment(text)

I still get the following error... Should I had add another line to create a connection between the lines ?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-8-a94c7ffe3d91>](https://localhost:8080/#) in <module>()
      7 aug = naw.WordEmbsAug(
      8 model_type='fasttext', model_path=('/content/drive/MyDrive/cc.tr.300.vec'),
----> 9 action="substitute")
     10 augmented_text = aug.augment(text)

4 frames
[/usr/local/lib/python3.7/dist-packages/nlpaug/model/word_embs/word_embeddings.py](https://localhost:8080/#) in _read(self)
     14 
     15     def _read(self):
---> 16         self.words = [self.model.index_to_key[i] for i in range(len(self.model.index_to_key))]
     17         self.emb_size = self.model[self.model.key_to_index[self.model.index_to_key[0]]]
     18         self.vocab_size = len(self.words)

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'index_to_key'