matsui528 / nanopq

Pure python implementation of product quantization for nearest neighbor search
MIT License
323 stars 43 forks source link

about reconstructed #4

Closed Usernamezhx closed 4 years ago

Usernamezhx commented 4 years ago

thanks for your work.

`import nanopq import numpy as np

N, Nt, D = 10000, 2000, 128 X = np.random.random((N, D)).astype(np.float32) # 10,000 128-dim vectors to be indexed Xt = np.random.random((Nt, D)).astype(np.float32) # 2,000 128-dim vectors for training query = np.random.random((D,)).astype(np.float32) # a 128-dim query vector

pq = nanopq.PQ(M=8, Ks=256) pq.fit(Xt, seed=123) X_code = pq.encode(X) # (10000, 8) with dtype=np.uint8 X_reconstructed = pq.decode(codes=X_code)

tmp = X[0] tmp1 = X_reconstructed[0] dis = np.sqrt(np.sum(np.square(tmp - tmp1)))`

the dis is about 2.0+ . dose it look like right?

matsui528 commented 4 years ago

This makes sense. Because PQ is a lossy compression scheme, there must be a reconstruction error. If you make the quantizer finer, the error will decrease, e.g., M=32 will result in dis=0.8

Usernamezhx commented 4 years ago

thanks for your reply.