Closed angelo337 closed 5 years ago
Hi,
in order to create the statistics the name of the ivpq index tables must be known. I think this is the problem here.
The order of steps in the README seems to be slightly wrong. Before the statistics can be created you have to call init function first or execute CREATE EXTENSION freedy
agiain, if you used the same names for the index table as described in the README.
SELECT init('google_vecs', 'google_vecs_norm', 'pq_quantization', 'pq_codebook', 'fine_quantization', 'coarse_quantization', 'residual_codebook', 'fine_quantization_ivpq', 'codebook_ivpq', 'coarse_quantization_ivpq')
I will change this in the README soon...
Guenthermi: thanks for your fast answer, could you please post me out some resource to follow after i got that working? I mean I already use Gensim vectors, with the Bin file I just compare documents agains that model. in your implementation how should i do that? with word2vec, I just request vectors to the index and produce a large vector for each sentence and classify that sentence with a Keras model or SVM. However I don't know in your implementation the right path to follow. thanks angelo
I think I don't understand what you actually want to do.
There are several ways to compare documents or text values consisting of several tokens by using word embeddings. One very simple method is to represent a larger text value by calculating the centroid (average value) of all word embedding vectors of terms occurring in the text value. This could be done by using the insert_batch
function of the extension, which calculate this vector and add it to the index structures. However, if you want to do something more complex you have to implement this yourself.
The purpose of this extension mainly focuses on fast semantic search. If you want to do classification you might use something else. However, you could use the kNN and kNN-Join functions as a kNN classifier if this makes sense in your case.
Guenthermi: I am trying to find similar words from a corpus I already train wikipedia in my language and I am looking for similar words from that embedding, after that I would like to create a search with an elastic search trying to mimic semantic search, launching a search for every single similar word from the original until certain distance 90% or so of similarity. at the moment that's my main idea. thanks
hi there, I manage to get install all dependencies and load your extension in a docker container, however when i arrive to te las step in the process, (Statistics) I am getting this error:
Could you please help me or point me out some sort of solution? thanks very much