Open carschno opened 8 years ago
However, it turns out that this can in fact not be fixed so easily because multiple arrays in WordEmbeddings.java
are initialized with the size of numWords
, e.g. in line 206:
IDSorter[] sortedWords = new IDSorter[numWords];
Hence, numWords
must be an int.
I've tried to compute word embeddings with a vocabulary size of 6105270 with a dimensionality of 300, resulting in a
NegativeArraySizeException
inWordEmbeddings.java:100
:This seems to be due to an integer overflow because
numWords * stride
=numWords * 2 * numColumns
=6105270 * 2 * 300
=3663162000
>2^31
.The solution seems pretty easy: change the type of
WordEmbeddings.numWords
from int to long.