stsievert / salmon

A tool to collect triplet queries
https://docs.stsievert.com/salmon/
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Rethink query search/model update dataflow #72

Closed stsievert closed 4 years ago

stsievert commented 4 years ago

Currently, the data flow updates the model and searches/post queries for the current model:

Screen Shot 2020-09-08 at 1 47 01 PM

Here "Mi" means "model i" and "search" clears all queries then searches and posts queries. It basically runs this loop:

future = background(update_model)
for k in count(start=12):
    queries, score = search_queries(num=2**k)
    post(queries, scores)
    if future.done():
        break
stsievert commented 4 years ago

I think a better solution would be to do a complete search:

Screen Shot 2020-09-08 at 1 46 27 PM

In this, "score" only scores queries. Posting of queries is hidden from this setup. This implementation would basically require this loop:

queries, scores = [], []
while True:
    f_post = post(queries, scores)
    f_model = background(update_model)
    f_search = background(search_queries, num=2**k, stop=f_model.done)
    queries, scores = f_search.result()
    # ... (updating model, etc)

There's no reason post(queries, scores) can't happen in concurrently with f_model and f_search.