Closed clemgaut closed 1 year ago
Hi @clemgaut, thanks for using hover!
Semi-supervised fit should be do-able right now. The trick would be
For example, let's say you are using unsupervised fit with some vectorizer function:
def vectorizer_unsupervised(feature):
vector = some_pretrained_model.predict(feature)
return vector
You can pre-compute a feature -> label lookup using your labeled data and do
# assuming a dictionary called "lookup" that maps labelled data to integer label
# also assuming that you already know the number of classes for your classifier
# if not, just set NUM_CLASSES to be large enough to cover known classes
NUM_CLASSES = 3
def one_hot_encoding(feature, num_classes):
'''
One-hot vector for labeled data. For unlabeled data, return a zero-valued vector.
'''
arr = np.zeros(num_classes)
label = lookup.get(feature, -1)
if label >= 0:
arr[label] = 1.0
return arr
def vectorizer_semisupervised(feature):
'''
Vectorizer to pass to dimensionality reduction.
'''
vec1 = vectorizer_unsupervised(feature)
vec2 = one_hot_encoding(feature, NUM_CLASSES)
return np.concatenate(vec1, vec2)
So umap
or ivis
will just work the same way as unsupervised, but you've baked label information into the vectors.
Thank you, for your answer, I will also look into using the predefined functions of umap and ivis for unsupervised learning. I might do a PR if I get something working eventually.
Hello,
Thank you so much for providing hover as an open source tool!
I was wondering if it would be possible to have the option to make a semi-supervised fit with umap or ivis. Indeed, from what I understand, both umap and ivis are fit in an unsupervised way: only the embedding information is used in the fit step. For data belonging to the train and dev sets (public sets from hover implementation), hover knows what class they belong to. As both umap and ivis support providing labels during the fit step (-1 if no label is known, to do semi-supervised fit), I was wondering if you considered adding the class information in the fit step?