stsievert / salmon

A tool to collect triplet queries
https://docs.stsievert.com/salmon/
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

MAINT: Clean adaptive algorithms #71

Closed stsievert closed 4 years ago

stsievert commented 4 years ago

What does this PR implement? It runs experiments to show how the Salmon server performs as the number of users varies. This PR simulates n_users crowdsourcing participants answering questions.

TODO:

This PR no longer does the above. Instead, it fixes some bugs with the adaptive implementation. In addition, it does the following:

Reference issues/PRs #65 is necessary for this PR. That PR verifies that adaptive gains are present offline when the time required for model updates aren't considered. This PR isn't nearly as deterministic.

I need to rerun #65 to ensure that those adaptive gains are realizable.

stsievert commented 4 years ago

It's not uncommon to see about 5 queries with the same head appear in a row in this implementation. That requires more investigation. Are the query scores negated? Are the searches too complete?

I do recover a 1D manifold with the strange fruit example. After ~700 examples, 10 elements on the end of the manifold are [9, 8, 7, 6, 5, 3, 4, 0, 10] from distal to proximal. I think I'd expect that order to be reversed. I've also noticed that with a moderate number of responses (~700) the embedding conforms very nicely with a straight line. With more responses (~1800) that's less true; the embedding is not as much of a manifold; there's more variation.

I think the best way to determine if the scores are negated is to measure the accuracy of the embedding throughout time for both implementations (negated and non-negated scores).

stsievert commented 4 years ago

One example for the negated scores: I saw a query with head 36 and left/right of 12/13 with this embedding:

This embedding is after 3320 responses (long after the 1D manifold was present). It seems this query is meaningless: the distance from 36 and 12 is nearly the same as the distance from 36 and 13.