nasir6 / zero_shot_detection

61 stars 18 forks source link

How to generate class embedding files. #8

Closed wjhmike95 closed 3 years ago

wjhmike95 commented 3 years ago

Hi, thanks for you great job. I just get confused how do you generate these classes embedding files(fastext, glove). How does the index in classes embedding files match to the class id? Could provide a little more details about generating class embedding files? Thanks!

nasir6 commented 3 years ago

@wjhmike95 the class embeddings shared with the codebase are sorted in order of the classes defined here.

zhongxiangzju commented 3 years ago

Hi, could you figure out how to generate the 300 dimension vector of each label ? I tried to generate word embeddings for each label in coco dataset using gensim, but got different results.

import numpy as np
import gensim.downloader 
print(list(gensim.downloader.info()['models'].keys()))
# ['fasttext-wiki-news-subwords-300', 'conceptnet-numberbatch-17-06-300', 'word2vec-ruscorpora-300', 'word2vec-google-news-300', 'glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-twitter-25', 'glove-twitter-50', 'glove-twitter-100', 'glove-twitter-200', '__testing_word2vec-matrix-synopsis']
word_vectors = gensim.downloader.load('word2vec-google-news-300')
person_embedding = word_vectors['person']
person_embedding  = person_embedding / np.linalg.norm(person_embedding)
print(person_embedding)
# 0.1208263, -0.1084009, 0.00755164, 0.07369547, -0.06384084, 0.07026777 ...

which is different with the first row in ./zero_shot_detection/MSCOCO/word_w2v.txt 0.092629, 0.013665, 0.037897, 0.034125, 0.015237, 0.034970 ...

GuangyuanLiu1999 commented 1 year ago

Hi, could you figure out how to generate the 300 dimension vector of each label ? I tried to generate word embeddings for each label in coco dataset using gensim, but got different results.

import numpy as np
import gensim.downloader 
print(list(gensim.downloader.info()['models'].keys()))
# ['fasttext-wiki-news-subwords-300', 'conceptnet-numberbatch-17-06-300', 'word2vec-ruscorpora-300', 'word2vec-google-news-300', 'glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-twitter-25', 'glove-twitter-50', 'glove-twitter-100', 'glove-twitter-200', '__testing_word2vec-matrix-synopsis']
word_vectors = gensim.downloader.load('word2vec-google-news-300')
person_embedding = word_vectors['person']
person_embedding  = person_embedding / np.linalg.norm(person_embedding)
print(person_embedding)
# 0.1208263, -0.1084009, 0.00755164, 0.07369547, -0.06384084, 0.07026777 ...

which is different with the first row in ./zero_shot_detection/MSCOCO/word_w2v.txt 0.092629, 0.013665, 0.037897, 0.034125, 0.015237, 0.034970 ...

Hi bro, now do you know how to solve this problem? Do you know how to generate the fasttext.npy file?