stsievert / salmon

A tool to collect triplet queries
https://docs.stsievert.com/salmon/
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

ENH: Allow using different sampling algorithms #22

Closed stsievert closed 4 years ago

stsievert commented 4 years ago

What does this PR implement? It allows running different sampling algorithms. This might include random sampling or different adaptive algorithms.

TODO

A good dummy for this might be a "round-robin" algorithm where the head is selected at random and the bottom items are selected randomly (for now).

Future work:

stsievert commented 4 years ago

This PR implements includes a Docker machine to run the different adaptive sampling algorithms. This backend has two endpoints: /init and /model. It specifically does not have endpoints for /get_query or /process_answer. That way, the serving of queries and the computation of queries are completely separated. As a consequence, errors on the backend are not caught, the system hangs until a query is computed and the backend is continuously running.

stsievert commented 4 years ago

I've implemented a manager class/module to separate out the logic of serving queries and retrieving queries from the algorithms.

stsievert commented 4 years ago

It specifically does not have endpoints for /get_query

Here's an implementation that works with the current implementation and works with /get_query:

This implementation is more flexible, and the web client will communicate directly with the algorithm (so cleaner code). All the computation will be handled with Dask, so I don't see think we need to worry about overloading the server.

For example, what if algorithm is random with focus on process answers? Define a get_query function and define run to only process answers as they're received. What if the query is returned by some model? Run the model with the context provided to get_query.