stsievert / salmon

A tool to collect triplet queries
https://docs.stsievert.com/salmon/
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

[feature request] Validation sampling #111

Closed kushinm closed 3 years ago

kushinm commented 3 years ago

Request for a mode that allows us to validate embeddings from a round of data collection by showing the exact same sets of triplets to a new set of participants to measure inter-rater reliability.

stsievert commented 3 years ago

When are you interested in asking these validation queries? Randomly interspersed with other questions, or separately at the very end/beginning?

stsievert commented 3 years ago

Apparently, the requirements are:

  1. Random choice of validation sampling (intermixed with other algorithms; not at beginning/finish).
  2. Query ordering:
    • Ask queries in the list (with a single pointer; nothing user-specific).
    • When end of list reached: shuffle list, repeat.
  3. Query specification:
    • The number of validation queries ("the most common use case")
    • The specific queries (useful in filtering out bad actors).

That seems pretty straightforward.

kushinm commented 3 years ago

Can you update/point to docs for the new feature?

stsievert commented 3 years ago

The docs can be found in #112 (they're not merged yet). There's a usage example in algorithms.rst (which will eventually be rendered at https://docs.stsievert.com/salmon/algorithms.html), and the API docs are at

https://github.com/stsievert/salmon/blob/3e861af75068620fe844bae446d9439ae95749cf/salmon/triplets/samplers/_validation.py#L12-L29