Closed ashleyabraham closed 4 years ago
Not sure such a nested PQ is useful of not, becuase a PQ with an increased parameter would be usually better. But the following nested PQ should work.
import nanopq
import numpy as np
N, D = 1000, 24
X = np.random.random((N, D)).astype(np.float32) # 1,000 24-dim vectors
# Instantiate with M=4 sub-spaces, with the number of centrods per sub-space is Ks=16
M, Ks = 4, 16
pq = nanopq.PQ(M=M, Ks=Ks)
# Train codewords
pq.fit(X)
# codewords
# The shape is (4, 16, 6), this means that:
# - 4 supspaces
# - 16 codewords for each supspace
# - A codeword is a 6-dim vector
print(pq.codewords.shape)
# Given the codewords, train second-level PQ instances
# For each subspace, create a PQ instance, with M=2 and Ks=4
second_level_pqs = []
for m in range(M):
second_level_pq = nanopq.PQ(M=2, Ks=4)
second_level_pq.fit(pq.codewords[m]) # Train by corresponding codewords
second_level_pqs.append(second_level_pq)
# Check
print(second_level_pqs[0].codewords.shape) # shape = (2, 4, 3)
I am looking in to do centroid of centroids using NanoPQ, is it possible?. I have a first level nanopq model M=4, K=16, D=24. The codewords that is produced is (4, 16, 6), can this output be sent as an input for the second level nanoPQ to calculate centroid of centroids? The reason for investigating centroid of centroids is due to processing large datasets and reduce processing time.