janfreyberg / superintendent

Practical active learning in python
https://superintendent.readthedocs.io
189 stars 18 forks source link

Distributed mode workflow #70

Closed jarandaf closed 3 years ago

jarandaf commented 3 years ago

This is more a question rather than an issue.

After reading the docs, it is not clear to me how multiple labellers are expected to work. From what I could understand, every labeller should run his/her own notebook (e.g. basically the same one for all labellers). Is that correct?

On the other hand, the active learning process can be run in the background somewhere else. How are the next samples to be labelled distributed among the labellers? Are these samples "flagged" somehow in the database and taken into consideration first by the labelling widgets? I see there is a priority field in the superintendent table but I am not sure if that is its purpose.

Thank you!

janfreyberg commented 3 years ago

Hi, that's correct, and that is its purpose. Basically, the database table implements a priority queue, and "workers" (i.e. the labelling notebooks) pull items off that queue. The priority is set by the machine learning system that generates uncertainties and ranks unlabelled data points accordingly.

jarandaf commented 3 years ago

Thank you for the clarification!