[feature request] Validation sampling

stsievert / salmon

A tool to collect triplet queries

https://docs.stsievert.com/salmon/

BSD 3-Clause "New" or "Revised" License

9 stars 2 forks source link

[feature request] Validation sampling #111

Closed kushinm closed 3 years ago

kushinm commented 3 years ago

Request for a mode that allows us to validate embeddings from a round of data collection by showing the exact same sets of triplets to a new set of participants to measure inter-rater reliability.

stsievert commented 3 years ago

When are you interested in asking these validation queries? Randomly interspersed with other questions, or separately at the very end/beginning?

stsievert commented 3 years ago

Apparently, the requirements are:

Random choice of validation sampling (intermixed with other algorithms; not at beginning/finish).
Query ordering:
- Ask queries in the list (with a single pointer; nothing user-specific).
- When end of list reached: shuffle list, repeat.
Query specification:
- The number of validation queries ("the most common use case")
- The specific queries (useful in filtering out bad actors).

That seems pretty straightforward.

kushinm commented 3 years ago

Can you update/point to docs for the new feature?

stsievert commented 3 years ago

The docs can be found in #112 (they're not merged yet). There's a usage example in algorithms.rst (which will eventually be rendered at https://docs.stsievert.com/salmon/algorithms.html), and the API docs are at

https://github.com/stsievert/salmon/blob/3e861af75068620fe844bae446d9439ae95749cf/salmon/triplets/samplers/_validation.py#L12-L29