stsievert / salmon

A tool to collect triplet queries
https://docs.stsievert.com/salmon/
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link
active-learning crowdsourcing embedding machine-learning triplet-loss triplets

Salmon

DOI badge


Salmon is a tool for efficiently generating ordinal embeddings. It relies on "active" machine learning algorithms to choose the most informative queries for humans to answer.

Documentation

This documentation is available at these locations:

Please file an issue if you can not access the documentation.

Running Salmon offline

Visit the documentation at https://docs.stsievert.com/salmon/offline.html. Briefly, this should work:

$ cd path/to/salmon
$ conda env create -f salmon.lock.yml
$ conda activate salmon
(salmon) $ pip install -e .

The documentation online mentions more about how to generate an embedding offline: https://docs.stsievert.com/salmon/offline.html#generate-embeddings

With this, it's also possible to create a script that uses and imports Salmon:

from salmon.triplets.samplers import TSTE
import numpy as np

n, d = 85, 2
sampler = TSTE(n=n, d=d)

em_init = np.array([[i, -i] for i in range(n)])
sampler.opt.initialize(embedding=em_init)

queries, scores, meta = sampler.get_queries(num=10_000)

This script allows the data scientist to score queries for an embedding they specify.