matsui528 / nanopq

Pure python implementation of product quantization for nearest neighbor search
MIT License
321 stars 43 forks source link

why with parametric init has poor performance than non-parametric one ? #18

Closed Chuncbs closed 2 years ago

Chuncbs commented 2 years ago

hi,friend ,I have two question .

  1. why with parametric init has poor performance than non-parametric one according to your unit test 'test_parametric_init'? it is inconsistent with the conclusion of the paper 《Optimized Product Quantization for Approximate Nearest Neighbor Search 》--Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun

`def test_parametric_init(self): N, D, M, Ks = 100, 12, 4, 10 X = np.random.random((N, D)).astype(np.float32) opq = nanopq.OPQ(M=M, Ks=Ks) opq.fit(X, parametric_init=False, rotation_iter=1) err_init = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))

opq = nanopq.OPQ(M=M, Ks=Ks)
opq.fit(X, parametric_init=True, rotation_iter=1)
err = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))

self.assertLess(err_init, err)`
  1. the code compute normal not need rotate X, the decode will rotate code to original space at 255 line of opq.py

self.pq.decode(codes) @ self.R.T

matsui528 commented 2 years ago

Thanks for pointing this out.

Yes, the following line err = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X))) should be err = np.linalg.norm(X - opq.decode(opq.encode(X)))

It is because:

Thus, opq.decode(opq.encode(X)) should be compared to X, not opq.rotate(X).

If I revise the above point and slightly change the parameters, everything seems okay.

     def test_parametric_init(self):
-        N, D, M, Ks = 100, 12, 4, 10
+        N, D, M, Ks = 100, 12, 2, 20
         X = np.random.random((N, D)).astype(np.float32)
         opq = nanopq.OPQ(M=M, Ks=Ks)
         opq.fit(X, parametric_init=False, rotation_iter=1)
-        err_init = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))
+        err_init = np.linalg.norm(X - opq.decode(opq.encode(X)))

         opq = nanopq.OPQ(M=M, Ks=Ks)
         opq.fit(X, parametric_init=True, rotation_iter=1)
-        err = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))
+        err = np.linalg.norm(X - opq.decode(opq.encode(X)))

-        self.assertLess(err_init, err)
+        self.assertLess(err, err_init)

@calvinmccarter What do you think? Let me know if I miss something...

calvinmccarter commented 2 years ago

@matsui528 -- yes, you are right, my bad. Thanks for catching this!

madmtang commented 5 months ago

Hi! I wonder why that when change M to 4 then the test still fails?