o19s / quepid

Improve your Elasticsearch, OpenSearch, Solr, Vectara, Algolia and Custom Search search quality.
http://www.quepid.com
Apache License 2.0
284 stars 101 forks source link

Number of samples selection for the book of judgments #781

Open shantanu156 opened 1 year ago

shantanu156 commented 1 year ago

Is your feature request related to a problem? Please describe. I started exploring the latest judgment list feature and noticed that whenever I add a new query and "populate book" + "Refresh Query/Doc pairs for book", not all or enough queries are added to the judgment book. I added a query that had more than 3000 results, however, out of these only 3 samples were added to the judgment list and after labeling these samples it was over.

Describe the solution you'd like It would be great if we could mention the minimum samples we would like to transfer to the Judgment list. Eg. for the above query I would like to transfer at least 50 random samples to the book of judgment for annotation. Or if these samples are considered based on the threshold, then the admin should be able to adjust the threshold.

Describe alternatives you've considered This makes the Book of Judgment tricky to use and we have to then rely on the usual way

Additional context Also, many times when the first judgment is initiated, it is normally empty i.e. no content image

epugh commented 1 year ago

thanks for looking into this...! Whatever fields you have shown in the main page should be recorded... Could you show waht the main page with the search results looks like? There may be an issue if you have a nested JSON data structure, just a guess...

Also, when you populate the book, it grabs over whatever results are being shown... Can you share a screen shot of the page with the results...?

epugh commented 1 year ago

There has been some bug fixes that you might want to pick up and see if it works the way you want. Yeah, we don't have any kind of sampling, though that might be cool! One obstracle of sampling from 3000 resutls is then we would need to LOAD 3000 results from the search engine in order to get the data... (I wonder if you could instead craft a query that randomly sampled??)