Dynamic efficiante Schulze method calculation engine

BoazChen commented 11 years ago

Schulze method (there's a Heb page too) is about combining lists of preferences. Every voter casts a partially sorted list, meaning she can group items she sees in the same level of preference together – 1,(2,3,4),5 (Note that sorting in a group is doable but doesn’t change the ballot since 1,(4,2,3),5 means the same).

There’re several implementations of engines that calculate the final order of the augmented preferences, but as much as we understand they all assume two-stage process - a preliminary casting of ALL the ballots, and only then a single calculation.

Our system uses the algorithm for cumulative sort of issues and proposals, and therefore we need a dynamic Schulze engine, that efficiently updates the cumulative list as new ballots (or updated ones) are casted.

Two changes are therefore needed:

Since every change in any of the users preference is a new or updated ballot the engine should evaluate the change on the outcome cumulative list without re-calculating the entire graph every time.
Since many of the situation would be small lists of items the engine can simplify the algorithm for those case (e.g. simple count of pro’s and con’s if only to items are in the list).

The task includes:

Finding a good implementation to start with. We found https://github.com/bradbeattie/python-vote-core but maybe there’s a better one.
Understanding the algorithm and the way it is implemented
Designing the efficient solution for dynamic calculation
Implementation

Integration of engine to a ballot casting UI and the usage of the outcome in the system will be done afterwards.

itamaro commented 11 years ago

I have been working on this during the hackathon.

Went with the suggested existing implementation (https://github.com/bradbeattie/python-vote-core) which seemed fine (possibly found a bug in the library, see https://github.com/bradbeattie/python-vote-core/issues/15). Before diving into modifying the algorithm to support dynamic calculation (which is premature optimization), we decided to first check if the existing implementation could be used "as-is" with sufficient performance.

The chosen library implements several voting methods, some are "single winner methods" (including the classical Schulze Method), and some are "ordering methods" (including the Schulze Proportional Representation Method).

During the first day of the hackathon, we wrote a wrapper class for SchulzeMethod, that allows adding votes "online" (while internally building a histogram of ballots, which is the input for the Schulze algorithm).
- We tested performace of the SchulzeMethod algorithm for various amounts of candidates (5..12), for increasing amounts of ballots (10k, 20k, .., 50k), and found the runtime is sub-linear in ballot-count, and possibly-exponential in candidate-count. Runtimes were in the order of 1sec. A single run of SchulzeMethod with 100 candidates on 10k ballots took about 50sec.

On the second day, we realized that we need and ordering method, and not just a single winner method, so we switched to SchulzePR. Unfortunately, SchulzePR is much heavier... For ~7 candidates it takes an order of 3sec for 10k votes. Still sub-linear in ballot-count, but also still exponential in candidate-count. It took around 15sec for 10 candidates, and we didn't have the patience to wait for the completion with 30 candidates... (after ~15min)

Interestingly, SchulzePR just gives the resulting order (which needed some post-processing in order to consider ties properly). We also wanted to get pair-wise preferences from the data, which is generated by the classical SchulzeMethod. So, since it's much faster than SchulzePR, we call them both, so we can return the order and a pair-matrix. Another interesting property to calculate from the data would be - given the resulting order, for each group x in the order, overlay the order with the ratio between votes favoring group x over group x+1. This is still not implemented, but pretty straight forward, as we already have all the pair-wide data...

udishapiro commented 11 years ago

The algorithm is polynomial so if the implementation is exponential it is a bad one. In any case for the number of participants expected in the first phase (hundreds not 10k) we can probably live with current performance. We can compromise on frequency of updates (once an hour/a day). And here are the refs I found for dynamic shortest-path algorithms (the basis for Schulze implementation)

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.1855&rep=rep1&type=pdf (או זה http://research.cs.wisc.edu/wpis/papers/jalg96.pdf )

http://link.springer.com/chapter/10.1007%2F3-540-58715-2_118

http://u.cs.biu.ac.il/~liamr/papers/roditty.pdf

http://s3.amazonaws.com/academia.edu.documents/30239362/www.cs.uwaterloo.ca_2f_y6yang_2ffully_2520dynamic01.pdf?AWSAccessKeyId=AKIAIR6FSIMDFXPEERSA&Expires=1374303704&Signature=cXtLOwBbN3c4vL6LA9lP4iQyJwI%3D&response-content-disposition=inline

itamaro commented 11 years ago

It's possible that the implementation was polynomial, not exponential. We did not analyze it rigorously, just looked at the graph :-)

bradbeattie commented 11 years ago

If the implementation's performance is bad (and I agree, it could be better), patches welcome. ;)

hasadna / OpenCommunity

Dynamic efficiante Schulze method calculation engine #5