stsievert / salmon

A tool to collect triplet queries
https://docs.stsievert.com/salmon/
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Example code for generating embeddings #141

Closed jorgedch closed 2 years ago

jorgedch commented 2 years ago

Software paper

https://github.com/openjournals/joss-reviews/issues/4517

  1. A few imports and code changes were needed to run the example code in Generating embeddings offline for the experiment included in [examples/zappos](https://github.com/stsievert/salmon/tree/master/examples/zappos). Here is the updated code for the example:
# Imports:
import pandas as pd
from sklearn.model_selection import train_test_split
from salmon.triplets.offline import OfflineEmbedding

# Load and pre-process data:
df = pd.read_csv("responses.csv")  # from dashboard
X = df[["head", "winner", "loser"]].to_numpy()
X_train, X_test = train_test_split(X, random_state=42, test_size=0.2)

em = pd.read_csv("embeddings.csv")  # from dashboard
em = em[["0","1"]].to_numpy()  # select embedding data from 2D space

# Create and fit model:
n = int(X.max() + 1)  # number of targets
d = 2  # embed into 2 dimensions
max_epochs = 500_000

model = OfflineEmbedding(n=n, d=d, max_epochs=max_epochs)
model.initialize(X_train, embedding=em)
model.fit(X_train, X_test)

# Inspect model:
model.embedding_  # embedding
model.history_  # to view information on how well train/test performed
stsievert commented 2 years ago

Updated in 06f6bdd1ad76813aff4f1d12bc6e6ea653240072; closing.