Closed RyanHuangNLP closed 4 years ago
Hi! If you train the model on 30 million sentences you should end up with an array of size (30*10^6, 100).
The formula to determine the approx size of the array is:
sentences * vector_size * np.dtype(np.float32).itemsize
For your purpose that'd be equal to:
30e6*100*np.dtype(np.float32).itemsize / 1024**3
which is about 12G. Thus 15 is a bit higher than expected
I found my sif_model.sv.vectors.npy file is just (758194, 100) matrix, but that file is 15G, while I save a (800000, 100) matrix to npy file, it just 600mb, so is it normal? I train the sif model on 30 million sentences