sukrutrao / Fast-Dawid-Skene

Code for the algorithms in the paper: Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian. Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification. KDD WISDOM 2018
https://sites.google.com/view/fast-dawid-skene
MIT License
42 stars 12 forks source link

Incremental mode #3

Open fedor57 opened 6 years ago

fedor57 commented 6 years ago

Hi, was able to use the aggregator actually, thank you very much!

It has squeezed out 1.8M responses into 500K labels using 4.5 hours on 1 thread on server and 35Gb of memory ;) I think we can incorporate the solution, but I need to implement some enchancements to make it more usefull in production scenario. I will share my thoughts here just to let you know what we think would be useful in our real situation:

And one extra off topic:

I would be happy to hear any thoughts regarding this, thank you!

fedor57 commented 6 years ago

Just to let you know. I was involved once in one of the search giants in calculating kind of freshness PageRank over constantly changing web graph. The algorithm somehow accumulated weight diff and distributed it to peers when weight exceeded some threshold. Also there were some heuristics to intensify processing near new nodes with big weights.

Regarding convergence in incremental scenario: perhaps we can backup values from previous steps and update peers of the worker / task in case there is a big change in value with a flag "include in the next partial iteration". Then run some partial iterations with full ones every 5 partial. If believe that such a technic could produce a VERY fast dawid skene algorithm implementation. ;) Especially for the incremental scenario.

vbsinha commented 6 years ago

Hi,

One way to achieve the first two points would be to use an online algorithm.