MemoryError for calculating cosine similarity scores

jind11 / TextFooler

A Model for Natural Language Attack on Text Classification and Inference

MIT License

485 stars 79 forks source link

MemoryError for calculating cosine similarity scores #3

Closed sumuzhao closed 4 years ago

sumuzhao commented 4 years ago

Hi,

I tried to pre-calculate the cosine similarity scores based on the counter-fitting word vectors, but met the Memory Error problems. The word vectors are (65713, 300) and finally the similarity matrix is (65713, 65713). There are some dot and element-wise division operations. I got 8G RAM. Any suggestions?

Thanks a lot!

jind11 commented 4 years ago

hi, the cos similarity matrix consumes about 30 GB RAM, which caused your out of memory problem. Do you have a larger RAM machine? Or you can also convert the float precision from 64 bits to lower one, say 32 bit or 16 bit.

sumuzhao commented 4 years ago

Well...I'll try to reduce the float precision. But I don't think it can work due to my low RAM... I'll think if there are any alternatives for this, such as reduce the size of the vocabularies... Anyway, thanks for your suggestion.

jind11 commented 4 years ago

yes, you can also shrink the vocab size.

SatyapragyanDas commented 3 years ago

Well...I'll try to reduce the float precision. But I don't think it can work due to my low RAM... I'll think if there are any alternatives for this, such as reduce the size of the vocabularies... Anyway, thanks for your suggestion.

While reducing the precision by using the following line: df = df.astype(np.float32) But i get the following error: ValueError: could not convert string to float: 'tt0000574'. What should be done?

jind11 commented 3 years ago

May I know where this line is used? I am not sure what "df" here refers to. Thanks!