gratipay / gratipay.com

Here lieth a pioneer in open source sustainability. RIP
https://gratipay.news/the-end-cbfba8f50981
MIT License
1.12k stars 310 forks source link

implement funds #449

Closed chadwhitacre closed 11 years ago

chadwhitacre commented 11 years ago

The idea here it to provide a way to lump individuals together into groups and projects for funding purposes. A "fund" here is a set of weighted percentages for individuals and other funds (funds can be nested). Then we use these funds as levels of indirection when shuffling money around each week on payday.

Was: group tips together for simultaneous adjustment

I believe this is what @joelmccracken is asking for on #434. The coarse-grained case would be adjusting all tips at once (#448). This ticket is for grouping tips together so that you could adjust 10 tips at once without touching the other 20. The language that Joel used was "a single tip covering multiple people."

chadwhitacre commented 11 years ago

After conversations with @MikeFair in email and IRC, I'm now looking at this as the answer to the question of aggregating individuals into groups on Gittip.

!m @joelmccracken

chadwhitacre commented 11 years ago

Gittip users can create one or more "funds" (as I'm calling them). A fund is a mapping of participant_id to integer weight:

{ foo: 1000
, bar: 400
, baz: 200
, buz: 10
 }

The part of the Gittip profile page where you see everyone you're giving to ... that becomes a UI over this data structure. Then in the API (#462) we have this as an endpoint that can be posted to by the Gittip UI as well as by third-party applications. Then, we allow for funds to be the keys as well:

{ foo: 1000
, "bar/baz": 400
, baz: 200
, buz: 10
 }

One thing to tease out is the interplay between absolute dollar amounts and percentages. Another is how to keep the damper on volatility.

Next step for me here is some paper prototypes ...

chadwhitacre commented 11 years ago

Just got off the horn with @MikeFair. He had the interesting idea of implementing this using CouchDB. We would store funds as full json docs in there, and use Couch's map-reduce implementation to compute dollar amounts from there. He's offered to spike that out for us. Reticketing as #488.

chadwhitacre commented 11 years ago

Users have zero, one, or more funds. We create a "default" fund for them when they register (cf. #489). In the UI we want there to be a default fund, to streamline the simple case where a user has a single fund. (Whether we enforce a one-fund minimum under the hood or allow the user to drop to zero funds is an implementation detail.)

All funds are public. When I go to a profile page I see a list of funds that the user has created, as well as the weekly dollar amount being given to each fund. These are also shown on community pages (#496). Funds are anonymous in that the allocation of each fund is not public. Users can't be part of their own funds. Non-self users and other funds can be part of funds.

Users allocate dollar amounts to funds. They no longer allocate dollar amounts directly to other Gittip accounts.

Each payday, the fund graph is reduced to dollar amounts from one account to another.

Issues subsumed by this one: #27, #55, #94, #216, #316, #327, #372, #434, #448.

justinabrahms commented 11 years ago

This sounds legit to me. I did a double take on "fund can't contain creator". I understand why, but I would have initially suspected that JKM would create a Django fund and put himself in it (though he likely wouldn't, it serves as a good example). Skipping it seems to remove some of the possible abuse potential of having someone tip a fund, then dropping everyone out but you, etc.

I think this is a clear move in a solid direction. :+1:

chadwhitacre commented 11 years ago

Exactly, @justinabrahms. Presumably Jacob's Django fund includes your Django fund, which includes Jacob, etc.

justinabrahms commented 11 years ago

Oh, interesting. You're suggesting that these can be nested. Are the weights accumulative? Something to watch out for is that you limit the depth of these transitive tips, else you'll need to ensure its not a cyclical graph. :)

joelmccracken commented 11 years ago

Oh wow, this sounds confusing. So a group fund can pay to other group funds, not only other individuals?

On Fri, Jan 18, 2013 at 1:07 PM, Justin Abrahms notifications@github.comwrote:

Oh, interesting. You're suggesting that these can be nested. Are the weights accumulative? Something to watch out for is that you limit the depth of these transitive tips, else you'll need to ensure its not a cyclical graph. :)

— Reply to this email directly or view it on GitHubhttps://github.com/zetaweb/www.gittip.com/issues/449#issuecomment-12434005.

chadwhitacre commented 11 years ago

@justinabrahms Yeah, designing the algorithm for computing concrete gift amounts each payday from the graph of funds is going to be an interesting challenge (cf. https://github.com/zetaweb/www.gittip.com/issues/129#issuecomment-12341021). :-)

@joelmccracken My thought was a fund is a set of percentages:

40% Adrian Holovaty -> Django
20% Guido van Rossum -> Python
10% Linus Torvalds -> Linux
10% Guido van Rossum
10% Adrian Holovaty
10% ACDC

That means: "Give 10% directly to the Gittip accounts of Guido, Adrian, and ACDC, then distribute 40% amongst whatever Gittip accounts are represented by Adrian's 'Django' fund, and 20% amongst whatever Gittip accounts are underneath Guido's 'Python' fund, and the remaining 10% amongst accounts under Linus' 'Linux' fund."

The simple case is that you'll have a single fund—you won't even have to know the word "fund" to use it. It'll just look like a set of percentages as above, and then a big fat "$100.00" at the top, which is the total amount you're giving each week; it will be split up according to the percentages. On payday we'd do a lot of number crunching to determine exactly where your money goes.

Does that make any sense?

joelmccracken commented 11 years ago

it does! Very good =D

On Fri, Jan 18, 2013 at 3:57 PM, Chad Whitacre notifications@github.comwrote:

@justinabrahms https://github.com/justinabrahms Yeah, designing the algorithm for computing concrete gift amounts each payday from the graph of funds is going to be an interesting challenge (cf. #129https://github.com/zetaweb/www.gittip.com/issues/129#issuecomment-12341021). :-)

@joelmccracken https://github.com/joelmccracken My thought was a fund is a set of percentages:

40% Adrian Holovaty/Django 20% Guido van Rossum/Python 10% Linus Torvalds/Linux 10% Guido van Rossum 10% Adrian Holovaty 10% ACDC

That means: "Give 10% directly to the Gittip accounts of Guido, Adrian, and ACDC, then distribute 40% amongst whatever Gittip accounts are represented by Adrian's 'Django' fund, and 20% amongst whatever Gittip accounts are underneath Guido's 'Python' fund, and the remaining 10% amongst accounts under Linus' 'Linux' fund."

The simple case is that you'll have a single fund—you won't even have to know the word "fund" to use it. It'll just look like a set of percentages as above, and then a big fat "$100.00" at the top, which is the total amount you're giving each week; it will be split up according to the percentages. On payday we'd do a lot of number crunching to determine exactly where your money goes.

Does that make any sense?

— Reply to this email directly or view it on GitHubhttps://github.com/zetaweb/www.gittip.com/issues/449#issuecomment-12440694.

chadwhitacre commented 11 years ago

A heart coin for you!

MikeFair commented 11 years ago

I'd like to allow creators to be a member of their own funds for a number of reasons including A) the ease with which a creator could work around such a restriction, B) the difficulty of implementing such a restriction at this time, and C) This kind of thing is more 'one man's opinion' versus functional requirement and therefore it's better to give the gifters/upstream funds a tool/setting to express such an opinion. For instance, a checkbox that says "This fund does not allocate to the creator's of the funds it contains" or a user that is able to mark "Reallocate tips for the creator's of funds" in their profile.

2) One of the things I would fully expect to see would be Chad's name in the GitTip fund. Something would seem off if it wasn't there.... (though if I am reading the idea right, the contents of these funds are a big black box atm.)

3) Another tool that comes into effect to help with the actual dollar amount allocation of these funds is the user's "target tip amount" (assuming they've set one).

The formula: [User's Target Tip Amount] / [Total Amount of All Tips Given] = [User's Percentage of Doneness]

Aside from the weight an item in a fund could have other settings (and this likely should go in as a separate GitHub ticket to be worked on after funds themselves are initially working). Two of these setting I expect to be a hard lower and upper bound for how much to give.

For instance, "Lower Bound = If this minimum amount cannot be allocated then allocate nothing and redistribute the weight to the rest of the fund" and "Upper Bound = Any amount allocated over this amount should be redistributed to the other items in this fund"

With an upper bound capability, a creator could allocate to themselves but set the upper bound to be "when user's doneness reaches 100%" which is a way of saying "Do not allocate any more than their target amount".

These bounds are cumulative to the total amount the user is to be tipped that payday. This means that if a single user receives tips through many projects or sources, (and assuming the upper bound setting of 100% doneness is in effect) then on payday the user will receive up to their target amount and no more. Any excess amount will get redistributed to the other items.

MikeFair commented 11 years ago

I'd like see the fund's contents to be public rather than private.

On larger projects I fully expect there to be multiple versions of a fund for the same project because people have different opinions about who the important people are and how important they are relative to the other items in the fund.

Being able to see the contents of the fund and hearing their creator's describe how they are constructed is an important aspect of the ongoing nature of selecting which funds to give to.

A user's specific allocations would be private, other people can't see who specifically is giving to what fund as that's more about personal choices, but what would remain visible is the total aggregate amount that each fund is receiving.

chadwhitacre commented 11 years ago

@MikeFair

I'd like to allow creators to be a member of their own funds. I'd like see the fund's contents to be public rather than private. (though if I am reading the idea right [...])

You're right that these go together: creators aren't allowed to be members of their own funds precisely because funds are private.

the ease with which a creator could work around such a restriction

Sure. If I'm Linus, I can create a shadow account and include that in my "Linux" fund, and then when people give to my Linux fund I'm able to recover some or all of that for myself. The restriction on creators in their own funds is a soft constraint intended to steer the growth of the community, not a hard line. We wouldn't even make this a terms of service violation; it'd be an acknowledged workaround: "If you want to receive money from your own fund, create a second account." We want to discourage people from being greedy on Gittip, but the point is that we trust Linus. The "no creators" constraint removes 99% of the burden of proof from Linus; his reputation for trustworthiness easily makes up the other 1%. And if he really, really wants to include himself in his own fund, he can.

the difficulty of implementing such a restriction at this time

Not sure I see this one. Does this refer to some constraint with the CouchDB option? If so, the answer is that, if at all possible, user experience drives technology, not vice versa.

C) This kind of thing is more 'one man's opinion' versus functional requirement

It's part of the defining character of Gittip that gifts are anonymous in the particulars (we had a ticket for making tips non-anonymous, #326; I just closed it). Private funds fit that character better than public funds. If we try to support both public and private funds we muddy the product.

and therefore it's better to give the gifters/upstream funds a tool/setting to express such an opinion. For instance, a checkbox that says "This fund does not allocate to the creator's of the funds it contains" or a user that is able to mark "Reallocate tips for the creator's of funds" in their profile.

I'd like to see this work the other way around: Just as Linus is free to create a second account, he is free to disclose what his funds contain. Disclosure is not a violation of Gittip's terms of service, but Gittip is not going to do it for you.

Being able to see the contents of the fund and hearing their creator's describe how they are constructed is an important aspect of the ongoing nature of selecting which funds to give to.

If that's true then it seems like when we roll this out we'll expect to see a) demand for public funds, b) funds that disclose elsewhere do better than funds that don't, c) third party tools to implement public funds on top of Gittip. For now, I'd like to move forward with private funds, depending primarily on the reputation of the fund managers to drive fund selection. We can revisit as needed once we have this out there for a while.

chadwhitacre commented 11 years ago

@MikeFair

3) Another tool that comes into effect [...]

I'm reading this as unrelated to the matter of funds being private (correct me if I'm wrong). This gets into the details of computing amounts from funds on payday, right?

MikeFair commented 11 years ago

Four things:

1) There's a distinction between a user's "Basket" (where they personally are giving) and a "Fund" (some opinion/allocation function other people use to select where to give).

A user's Basket is clearly a private affair. A Fund/Allocation Algorithm I consider a public affair.

2) The model I'm working within is that we are always doing direct person to person gifts.
We do not ever give to funds, we are always giving directly to other people.

So in this model, what's a fund then?

A Fund (and the nesting of funds) is an organizational tool to aid us in the process of selecting the people we're going to give to and how much we're going to give to each person.

A Fund is a mechanism to, in essence, create a function call that expands one label in our own basket into a list of people and amounts.

In this model none of us give anything to Linus' fund. We use the allocation function Linus published (called a Fund) in the execution of our own allocation function. This is conceptually no different than executing other's published algorithms in our own computer systems.

Since the model is that it's actually the user's basket that is executing some allocation function to figure out who to give to, this makes these allocation functions more akin to source code in that respect.

This then creates a very consistent story of what's happening on payday that exactly matches what's already happening today.

Since I've subscribed some fraction of my giving to Linus' fund, then what happens on payday is my basket executes the 'Linus' Linux Kernel Fund' allocation function which expands to a list of people that I'm going to give to and how much I'm going to give them this payday.

I'm still the one ultimately allocating those direct gifts, I'm using Linus' opinion as part of my selection process.

The anonymity of where the money came from to the recipient in this point of view remains intact.
This remains consistent with exactly what's already happening right now, people choosing people to tip.

3) Relating to creator's not being able to put themselves into the fund. A fund can come from anything and be about anything. It's not only the simple case that a single individual creates the Fund.

It could be created by anything, a person, a committee, the outcome of some election, a computer algorithm; anything that is capable of creating a list of names and weights. This makes it really hard to determine who the creator is in some of these cases (and what's driving the difficulty of enforcement).

For example, if Debian chooses to hold a semi-annual election to reconstruct the contents of the Debian Allocation Fund each period, we're obviously not going to say that all the people who voted can't be in it (and how would we know who they were anyway), it makes sense the sitting president of Debian would likely be in the Fund despite that's also the most likely person to take responsibility for publishing the fund...

What gives a fund power is people subscribing to it and pulling the function/algorithm that it represents into their own basket and using it to make their allocations with.

If people think the creators, however the chooser defines creator, shouldn't put themselves into their own funds, they won't subscribe to or use that person's allocation algorithm thereby removing power from that Fund.

As the Funds contents are public in this model, if someone didn't want to give to a Fund's creator, the person would be able to create their own new Fund and construct it like this: "Start with the Published fund I don't like because the creator is in there and apply this transform to remove the creator."
Once published, other people could subscribe to that version of the Fund if that was their wish. (In fact I've always envisioned that such a transform (think of it like an XSLT), in and of itself, would be a popular published code snippet to be applied by people in many cases.)

Now instead of subscribing directly to the other fund, they subscribe to their own version of it.

I haven't discussed the pipelining of Fund transformations yet as I felt it was way too early in the implementation to bring it up.

Like with our source code, I envisioned there would be a whole suite of tools to aid us in putting together these Funds. Allowing people to start with someone else's list of names (source code) and then apply a series of transforms (like a series of patches/commits/XSLTs) as a pipeline or build process an integral part of the equation to make it so that we can easily take other people's ideas for where to give and transform it to make it our own.

It's the same thing we do with our code, applied in a new context.

4) All that said, it seems the main concern here is having people be able to figure out how much they are getting through a particular fund somehow reducing the anonymity of the gifts to the recipient.

I've somewhat addressed that by showing how the Funds are functions people begin with, not direct allocators.

Instead of keeping the Fund's contents private keep the amounts that Fund is responsible for allocating private. This way everyone can see what these Funds contain, but not how much is going where as a result of that.

So instead of a big $100 at the top of the Fund (because we aren't actually tipping a fund, the fund is a function people are using to figure out who and how much to tip), perhaps we assign some impact scores that say something about the fund's popularity, influence, and impact that it's having within the social framework.

Perhaps a "download count" or "number of subscribers". This keeps private how much came from where and to where and is simply saying how popular this Fund is for people to use in their allocation algorithms.

MikeFair commented 11 years ago

@whit537

3) Another tool that comes into effect [...]

I'm reading this as unrelated to the matter of funds being private (correct me if I'm wrong). This gets into the details of computing amounts from funds on payday, right?

It's separate, and not so much about privacy as it is creator's being members of their own Funds. The idea addresses the concern of people being greedy and using funds to try and suck money from the system.

1) Ironically, the most powerful tool we would have to audit these Funds to ensure they represent what they claim to represent, that is making their contents public, is exactly what you're suggesting we don't do.

Making funds public is how we create oversight for these otherwise black boxes. It's how we as a community go about forming our opinions about and distinguishing between the funds we want to give to, just like what we do to choose what people we want to give to.

I can publish a GitTip fund just as easily as Chad can and so there'd be two GitTip funds in that case and people could choose between them. Without making them public how could anyone make a sane choice between them?

We also might just as easily say, the contents of the project's official fund is some formula that combines the ideas from the Fund's published by these people (a form of board of directors).

Seeing the results of that formula is an easy way to audit that the board is doing a good job. Over time, a trust develops where people aren't so sensitive, but the ability to trust but verify what's in a fund is always there.

2) Making the contents public won't change what it takes for a Fund to get subscribers which is how a Fund gets power to be abused in the first place (who cares if you get 100% of nothing).

If anything, the openness helps prove that it's doing exactly what it's supposed to be doing.

If it remained closed and private, and even if the contents were supposedly published elsewhere, how would anyone know that's actually what was happening. That's exactly how Bernie was able to do what he did. He told everyone one thing, but in practice something else was actually happening (and those that were supposed to validating what he was claiming didn't do a good job at that).

3) In the end a project is the people that make it up. It's not separate from them, the body of people that take up implementing the project's mission/goals are what make the project go.
You make the project go by making the people go.

We want people to get done. That's how we're going to empower them to be part of the open solution.

If you as a donor can see that people are going to cap out when their targets reach 100% doneness then you've eliminated the greed factor as a concern.

They could set their fund to be 95% themselves and 5% everyone else and they won't receive a dime more than their target. Once they reach 100% doneness (regardless of how that happens), the fund becomes 100% everyone else for all remaining gifts.

The really popular people with no cap are the people introducing the most greed risk.

So my thoughts on counteracting that (and again this will need to go in as a separate issue) are that if you haven't set a target, then your target gets implicitly set according to some kind of normalization function.

The idea is that if you haven't set a target, then your target gets automatically set to somewhere to the middle of the pack preventing you from being able to abuse the Fund system for your own personal gain.

If you have set a target, then you're going to cap out at 100% doneness (a public number) and therefore again, you aren't going to be able to abuse the system for your own personal gain.

If you have said, I don't want tips, well that's the same as a target of 0 for this calculation and you pose no risk.

The most powerful tools we have to discourage greed is capping people out and making the whole thing as transparent as possible.

Now that said, exactly who you are giving to and how you are choosing to allocate (aka your personal basket) is your own private and personal information.

Thereby, recipients are not allowed to see where the money came from because it would violate the privacy policy of where someone gives is personal and private information to the giver. This is separate from whether or not the Funds, the formulas the community is using to execute their private baskets, are private.

This model makes both intuitive and practical sense to me: 1) All donations are gifts from the giver to the receiver and always people (or at least legalized entities)

2) All donations are the private and personal information of the giver and not shared

3a) Givers use Funds and other tools to aid in the recipient selection/allocation process 3b) Like all other tools/source code in our ecosystem, the Funds and tools used are open and shareable

chadwhitacre commented 11 years ago

!m @MikeFair. Some important considerations here regarding oversight and trust.

It's not only the simple case that a single individual creates the Fund.

Likewise, it's not only the case that Gittip accounts represent single individuals. We already have accounts that represent groups, e.g.:

https://www.gittip.com/readthedocs/ https://www.gittip.com/workforpie/

For example, if Debian [...]

There'd be a "Debian" Gittip account that would be separate from the sitting president's Gittip account. I can see a need in this use case for transparency: if I'm in the Debian fund but I'm not directly controlling it, I want to be able to see that the fund is being allocated in the way we all voted for. To support this case we could allow funds to opt-in to disclosing their contents to individuals who are directly named in the fund. Perhaps #502 expands to encompass this: when you register an account on Gittip and promote it to a "group" account, then the fund disclosure rule kicks in automatically as part of that. This isn't needed for v1 of the fund feature, though.

I can publish a GitTip fund just as easily as Chad can and so there'd be two GitTip funds in that case and people could choose between them. Without making them public how could anyone make a sane choice between them?

Trust. If a person allocates 67% to MikeFair/Gittip and 33% to whit537/Gittip that means they trust you twice as much as me.

Making funds public is how we create oversight for these otherwise black boxes.

Oversight is needed when there are strings attached to the money. With government, the expectation of return is defense, infrastructure, social programs, etc. With Madoff, the expectation was a financial return on investment. With Gittip, there are no strings attached to the money. These are free gifts. What would oversight even mean here, when there are no direct expectations?

[I]t seems the main concern here is having people be able to figure out how much they are getting through a particular fund somehow reducing the anonymity of the gifts to the recipient.

The root principle is inspiring generosity. Both anonymous particulars and private funds are derived from that. When I give money to Debian on Gittip, I'm fully entrusting the money to them. I'm disclaiming any right to oversee or influence what they do with the money.

carsomyr commented 11 years ago

@whit537 I believe that this is the equation we're after.

![equation](http://latex.codecogs.com/gif.latex?%28I - %28P - diag%28P%29%29%29^{-1} diag%28P%29 D)

Here, P is the payout matrix: Each row represents an entity i (fund or person), and each column represents the fraction of that entity i's purse paid out to entity j. Note that the entries (i, i) represent the amount of the entity's purse retained.

D is the column vector of initial purse allocations. Note that many entities will start with a purse of 0. Any nonzero purses necessarily correspond to people who are the ones giving money, and all funds start with a purse of 0 (people are kicking off the giving, after all).

The computed result is the payout column vector. I'll write this up, show my derivations, and provide examples in the near future.

MikeFair commented 11 years ago

Hey Roy,

I've been putting together a recursive SQL function that does much the same as you've been describing though slightly different from the phases you described, all the same math though I believe.

Step one: sum tip amounts grouped by participant (this get the totals that represent the denominators for each fund/participant) call that "totals".

Step two: select tipper, tippee, amount/(total for this tipper) This creates a weight contribution that goes into each tipper from this tipper.

Iteratively do a UNION ALL Select tipper, tipper, (amount / (total for this tipper)) * (the percentage from the prior iteration) Where this tipper is the tippee from the prior iteration and the tippee in this iteration is not a tipper from prior a iteration on this path

This iterative multiplication puts everything in terms of a percentage of the original total amount and avoids giving cycles. Technically a cycle would eventually wind down to something so small we could ignore it, but for now I'm just outright excluding them.

Step three: sum up all the weights grouped by tippee. Because a tippee can appear in multiple paths, we want to sum up the total weight/percentage gong to the same tippee.

Step four: Multiply all percentages by the total for the tipper to get the actual amounts for each tippee.

I need to test the code a bit more but expect to post it here shortly ( next day or two). On Jan 23, 2013 1:02 PM, "Roy Liu" notifications@github.com wrote:

@whit537 https://github.com/whit537 I believe that this is the equation we're after.

[image: equation]https://a248.e.akamai.net/camo.github.com/dc0992bd548d918ea5ccffe8a9f24630af428286/687474703a2f2f6c617465782e636f6465636f67732e636f6d2f6769662e6c617465783f28492532302d25323028502532302d2532306469616728502929292535452537422d312537442532302a253230646961672850292532302a25323044

Here, P is the payout matrix: Each row represents an entity (fund or person), and each column represents the fraction of that entity's purse paid out. Note that the entries (i, i) represent the amount of the entity's purse retained.

D is the column vector of initial purse allocations. Note that many entities will start with a purse of 0. Any nonzero purses necessarily correspond to people who are the ones giving money, and all funds start with a purse of 0 (people are kicking off the giving, after all).

The computed result is the payout column vector. I'll write this up, show my derivations, and provide further examples.

— Reply to this email directly or view it on GitHubhttps://github.com/zetaweb/www.gittip.com/issues/449#issuecomment-12622083.

carsomyr commented 11 years ago

@MikeFair Looks about right, although we should account for cycles in a formal way. If not, that would be like a bank programmer using floats to represent monetary sums, and everyone cites that as a canonical example of why representation is important. Note that the matrix inversion in my formula is actually encoding repeated tipping iterations in a geometric sum of matrices. If matrix inversion is too intensive, we can use a sparse representation, compute successive terms of the geometric sum, and stop whenever the total change in all tips for everyone is less than a penny.

Also, what do you mean by "the percentage from the prior iteration"? Are you able to write down your calculations as matrix multiplications? Verbiage without code has the potential for multiple interpretations. Looking forward to seeing the SQL! A useful tool for testing all of this is GNU Octave; you can input payout matrices and see what happens under successive iterations.

MikeFair commented 11 years ago

If you're familiar with CTE (Common Table Expressions) they can be used to write "recursive" queries. Technically they are iterative queries, they arr able to use the resultset from a query as part of the FROM clause for another query. They continue to execute until the "recursive part" of the query returns an empty result set.

So a really simple incomplete example would be: WITH RECURSIVE totals(tipper, total) ( /* query to select totals grouped by tipper / ), payday(funder, total, tipper, tippee, percentage, depth) ( Select t.tipper funder, tot.total total, t.tipper, t.tippee, (t.amount/tot.total) percentage From tips t inner join totals tot on t.tipper=tot.tipper UNION ALL select p.funder, p.total, t.tipper, t.tippee, ((t.amount/tot.total) p.percentage) percentage From tips t inner join totals tot on t.tipper=tot.tipper Inner join payday p on p.tippee=t.tipper ) Select * from payday

The totals CTE is not recursive, it creates a subquery named "totals" that acts like a view within the scope of the query.

The payday query also creates the equivalent of a view called payday for the scope of the query, but this one is recursive. It executes the first part, and then iteratively executes the second part (the part after the "union all"). In the second part, the reference to payday is a reference to the results of the prior iteration and therefore useable for this iteration. When the second part returns the empty set (aka there are no more results to be added), it stops.

Using this, the query is able to continuously multiply the weight from the row on which the new tipper was a tippee to the current weight of this row. This normalizes all the weightings to the total amount provided by the original top level funder.

The steps I didn't do here were take the results of payday and sum percentage grouping by funder and tippee; and then take those results and multiply them again by the funder's total to get back to a dollar figure.

The real trick however is going to be that after we've got this basic part working there's going to be a lot of sub-pennies running around. My plan, though I'm sure it will need to be discussed, is to track/accumulate these sub-pennies until they accrue to something contributable.

In the same vein, it's totally possible that a tippee receives less than the minimum allocation amount from each tipper, but that the aggregate of all the tippers together is above the minimum allocation amount.

My plan here is to credit the tippee that total anyway, drawing the funds from one (or maybe a couple) of the tippers, but earmark the debit from the other tippers whose accounts didn't contribute. Then next time this kind of allocation is needed, it'll come from those who didn't do it last time. I'm looking at this as a different kind of tip that is more amount focused than refurring focused. To track the debit of these other tippers the system would automatically generates a kind of IOU tip from those that didn't pitch in to those that did. The system then uses these imbalances to select which tippers to draw from first next time keeping everything in balance overall.

This might be overcomplicating the problem a bit but I don't think so. Once you start doing percentages of percentages the numbers get small quickly and it's not like the tippers are starting off with large amounts and that combination tells me these sub-penny amounts are going to be a real phenomena that we'll want a good way to address.

As for cycles, a naive, ignore them and let the system crank on the math until the percentages get too small will do the right thing (although expensively).

I think we can do better though. I say we start a new thread to deal with handling cycles once we have some kind of fund implementation working in which cycles can occur. There's going to be a way to precompute the weight of each cycle and simply redistribute that weight to the other non-cyclic nodes as we iterate through but that's not now. At least not for me. :)

What do you think/see? Make sense? On Jan 23, 2013 5:29 PM, "Roy Liu" notifications@github.com wrote:

@MikeFair https://github.com/MikeFair Looks about right, although we should account for cycles in a formal way. If not, that would be like a bank programmer using floats to represent monetary sums, and everyone uses that example as a canonical example of why representation is important. Note that the matrix inversion in my formula is actually encoding repeated tipping iterations in a geometric sum of matriceshttp://math.stackexchange.com/questions/21055/a-geometric-infinite-sum-of-matrices. If matrix inversion is too intensive, we can use a sparse representation, compute successive terms of the geometric sum, and stop whenever the total change in all tips for everyone is less than a penny.

Also, what do you mean by "the percentage from the prior iteration"?

— Reply to this email directly or view it on GitHubhttps://github.com/zetaweb/www.gittip.com/issues/449#issuecomment-12632755.

carsomyr commented 11 years ago

@MikeFair I'm confused: In your calculation, percentages seem to be a fluctuating amount. I thought funds paid each other at fixed percentages? Tell you what, let's run through an example. How about the following matrix?

0.8 0.2
0.1 0.9

In other words, person 1 tips person 2 20%. Person 2 tips person 1 10%. Let's say each person starts with $1. According to my formula, the payouts come out to (0.89796, 1.10204). What would the SQL arrive at?

FYI, this is the Octave/Matlab code.

> a = [0.8 0.2; 0.1 0.9]
> diag_a = [0.8 0; 0 0.9]
> [1 1] * inverse(eye(2) - (a - diag_a)) * diag_a

The last line is the computation for the payout vector.

carsomyr commented 11 years ago

@MikeFair Also, if you're going to read the entire tipping table multiple times anyways, why not just implement the formula and use something like NumPy's sparse array package to represent whole iterations as matrix multiplication? Scientific computing is what Python excels at.

MikeFair commented 11 years ago

@carsomyr Because it's what I know. I don't know about NumPy (I can't even say I know Python really), I don't think in matrix terms the way you can, and I was writing code that I was thinking others would really be able to understand (and remake to be better). ;)

I'm excited to learn that you think in matrices and use MatLab/Octave et al. A lot of what I'd like to do in the future relies on stuff like Octave and can take advantage of CUDA, et al. to generate aggregated information on the various graphs and what not we can produce to push up to websites.

That all said, SQL excels at set operations, and I saw this as a set operations challenge. :) (I'll make another post just after this explaining the math part a bit better)

To your point about reading the table many times, that's why SQL Server has creatable indeces.

In the end, all the same math operations need to be performed, the question is more a matter of what's the process used to do all the math.

The query would be executed entirely on the SQL server CPU and then be handed back to the web server.

The database could/would be optimized with a tippee/tipper index on the tips table (and if we can precompute using triggers and/or cron jobs it will be even better). This means we won't ever be doing full table scans, we're doing fast B-Tree lookups where all the valid tippee/tipper links have already been prestored.

The data itself is also slowly changing (relatively speaking). That means that more often than not, it will be the exact same numbers from prior weeks being recrunched over and over again each week. If we use triggers/cron jobs to deal with computing the results as the inserts/updates/deletes are happening in an incremental way, then the query to calculate the final answers are a really trivial task.

It was with that in mind that really cemented my thoughts about doing the whole thing on the database side. It idn't hurt that I'd already written some SQL code like this before a couple times using CTEs in SQL, so I knew that the CTE approach would work, I knew we already had Postrges installed, up, and running as a core part of the environment, and so the number of patches to the code to get this functional would be minimal. The tables and data seemed to already exist and all we really have to do is just start using it that way.

carsomyr commented 11 years ago

@MikeFair Understood. I agree that the implementation can be kept separate from the ideas. I don't really insist on matrices either. It's just that they are:

  1. Close to the math. Expressing a problem in matrix terms is a compact way of getting your point across.
  2. Close to the implementation, too. Once the tipper/tippee information is read from the database, that's it. The computation goes down in one involved algebraic step.

Have you tried running the SQL on the two-entity example I mentioned above? I'm not trying to snow you in with math; rather, I think it's important to have a sanity check for what a procedure is doing. After all, we are dealing with people's money here. If they ask why they got paid as much as they did, we had better have a concise explanation. It's not even a big deal if our answers don't match. Much more important is a diagnosis of why they don't match and which policy most naturally models the tipping process.

Tell you what: If matrices don't suit you, can you write down the SQL in pseudocode? It would be something like "At every iteration, tippers compute the sum of their incoming tips and ...".

Alternatively, we can split up the work. If you give me two tables consisting of (tipper, tippee, percentage) rows and (initial allocation) rows, I can spit out a table of (payout) rows.

MikeFair commented 11 years ago

The reason I think for what you're seeing is that funds amounts aren't stored as percentages, they're stored as shares. Let's start with a deeper example though.

Let's say there's a fund that P1 really likes called F1 and personally tips it 20% (say it represents an app P1 likes).

F1 heavily relies on some library project, so like good citizens, the F1 team, tips the F2 fund 10%.

Now coincidently, P1 and P2 are respected developers of that library F1 relies on. Consequently they are members of the F2 fund at say 33.33% each.

So what we have is: P1 -> F1 @ 20%

F1 -> F2 @ 10%

F2 -> P1 @ 33.33% F2 -> P2 @ 33.33%

This is likely to be stored in the database as something like: P1, F1, 2000 P1, *, 8000

F1, F2, 10 F1, *, 90

F2, P2, 3 F2, P1, 3 F2, *, 3

The "*" entries represent the aggregation of all the other places the fund/tipper gives.

So P1 -> F1 @ 2000/10000 and F1 -> F2 @ 10/100 and F2 -> P1 @ 3/9 and F2 -> P2 @ 3/9

P1 is the "root" or "funder" in this example, and so the sum of the P1 tip entries represents the actual dollar amount to be distributed (I'm thinking in terms of pennies here). The good news is that this is easily converted into a percentage the same as everything else (amount/total amount of the group).

Step 1: Calculate totals {"tipper", "total"} P1, 10000 F1, 100 F2, 9

Step 2: Get the root/funder profile/fund (from the simple sample query above) {"funder", "total", "tipper", "tippee", "percentage"} P1, 10000, P1, F1, (2000 / 10000) = 0.2 P1, 10000, P1, , (8000 / 10000) = 0.8 UNION ALL P1, 10000, F1, F2, (10 / 100) * 0.2 = 0.1 * 0.2 = 0.02 P1, 10000, F1, , (90 / 100) * 0.2 = 0.9 * 0.2 = 0.18 UNION ALL P1, 10000, F2, P1, (3 / 9) * 0.02 = 0.3333 * 0.02 = 0.006666 P1, 10000, F2, P2, (3 / 9) * 0.02 = 0.3333 * 0.02 = 0.006666 P1, 10000, F2, , (3 / 9) * 0.02 = 0.3333 \ 0.02 = 0.006666

Step 3: Summarize and GROUP BY tipper/tippee {"funder", "tippee", "percentage"} P1, P1, SUM(0.006666) P1, P2, SUM(0.006666) P1, *, SUM(0.006666 + 0.18 + 0.8) = 0.986666

Step 4: Multiply by Funder's Total P1, P1, 0.006666 * 10000 = 66.66 P1, P2, 0.006666 * 10000 = 66.66 P1, , 0.986666 \ 10000 = 9866.66

So if P1 put in $100.00 (10000 pennies) then P1 and P2 would get $0.66 each and everything else would get $98.66 totaling $99.98. There'd be two pennies left over in escrow to be added in to next week's payday for P1's total (or possibly used elsewhere if we see a good need/use for it).

MikeFair commented 11 years ago

Completely eliminating any and all cycles from the graph, in this case that would mean removing any downstream occurences to P1, F1, and F2 once they were included in the path, would end up with P1's $0.66 left over in accruals along with the $0.02.

If F1 or F2 were also cyclic, then whatever amounts would have been delivered on the second time through would also get accrued.

Including them as part of the total in the next payday (aka dropping all accruals back into the top of the tree next time we process) seems like a very transparent and operationally sane way to deal with the issue.

carsomyr commented 11 years ago

@MikeFair Have you consider "normalizing" the database representation and storing percentages separate from the size of someone's purse? I see entries that are orders of magnitude apart, i.e., {2000, 8000} and {10, 90}. By dealing with raw sums up front, you are making an argument for this particular initial purse allocation, and it's confusing for an outside reader like myself to see differently-sized amounts. Separating percentage payouts from the initial purse allocations enables you to make an argument for any purse allocation.

Where does 0.02 come from?

carsomyr commented 11 years ago

@MikeFair FYI, I ran the numbers. Suppose P1, P2, F1, F2 represent entities indexed in that order. That means the payout matrix p is:

  .8   0    .2   0
  0    1    0    0
  0    0    .9   .1
  .33  .33  0    .33

The initial purse vector d is:

10000
100
0
9

After the first round of tipping ((d' * p)'), I got:

8003
3
2090
13

Am I misunderstanding something about the tipping policy? Intuitively, this results from:

  1. P1 keeping $8000.
  2. F1 keeping $90.
  3. F2 keeping $3.
  4. P1 tipping F1 $2000.
  5. F1 tipping F2 $10.
  6. F2 tipping P1 $3.
  7. F2 tipping P2 $3.
MikeFair commented 11 years ago

I've definitely considered denormalizing. That's going to be a major performance enhancement opportunity.

Step one was make it work, get it installed, make funds available and start promoting them. The internal scaling of an allocation set is open on purpose so that the numbers can be meaningful to the person/process managing them. Bringing them back to a percentage is easy enough.

Tips aren't stored as a current set of rows atm. They are stored as a serious of timestamped assignments. As I understand it, displaying the current tips is about querying out the most recent assignment for each tipper/tippee pair. I was thinking about creating a view which would make that implementation detail transparent and it ciuld also do the percentage calculations. Making it a materialized view should in theory make it even easier.

The other thing Postgres has are arrays. In the more full blown version of the query, the CTE actually builds up all the nodes in the path as an array. It uses whether or not the current node exists in this array to detect cycles. It seems like being able to store the full path traversal as an array might be usable in other ways, but trying to keep it current and maintained through triggers seems expensive, perhaps cron jobs or some other sort of asynchronous messaging if it turns out the path as an array is valuable.

My plan right now is to provide a single query that can be used to calculate payday amounts in the presence of Funds by treating funds as separate accounts that also give tips. These accounts however are like sub-accounts that user's can have permissions over. A user can manage the allocations for many sub-accounts. (If it's easy enough I even plan on multiple people being able to view/edit the same sub-accounts.)

With those two things in place that was going to be version 1 of the implementation.

From there it can get better, but it'll work. On Jan 25, 2013 12:45 PM, "Roy Liu" notifications@github.com wrote:

@MikeFair https://github.com/MikeFair Have you consider "normalizing" the database representation and storing percentages separate from the size of someone's purse? I see entries that are orders of magnitude apart, i.e., {2000, 8000} and {10, 90}.

— Reply to this email directly or view it on GitHubhttps://github.com/zetaweb/www.gittip.com/issues/449#issuecomment-12719973.

carsomyr commented 11 years ago

@MikeFair I still don't get the need to detect cycles. Is it intrinsic to your algorithm? I've dealt with graph algorithms in the past, and those with symbolic manipulations (i.e., through data structures and not through linear algebra operations) were error prone and it was hard to figure out what they were doing. Saving path traversals will make the algorithm even more undecipherable, I fear.

P.S. -- You may want to consider replying directly in the GitHub UI. I edit many messages after they are posted, and the changes will not show up in emails.

MikeFair commented 11 years ago

Hmm, the example I gave didn't show the cycle properly enough.

Picture this, P1->F1 F1->F2 F2->F1

The SQL query is recursive, joining the right hand side of the parent link (the tippee) to the left hand side of the child (the tipper).

This joining is done recursively and will continue until there are no more joins to connect to. The funds are a one to many relationship, but as any of the children funds can contain the parent fund (at any depth along the chain), there must be some way to stop the recursion.

What I can't figure out is hoe you're going to build your matrices. The relationships of parent to children are not uniform. It's also easy for one tipper to end up supporting hundreds of tippees.

I don't see how you'd go about building the matrix to process the arbitrarily nested directed graph. On Jan 25, 2013 6:52 PM, "Roy Liu" notifications@github.com wrote:

@MikeFair https://github.com/MikeFair I still don't get the need to detect cycles. Is it intrinsic to your algorithm? I've dealt with graph algorithms in the past, and those being manipulated symbolically (i.e., through data structures and not through linear algebra operations) were error prone and it was hard to figure out what they were doing. Saving path traversals will make the algorithm even more undecipherable, I fear.

P.S. -- You may want to consider replying directly in the GitHub UI. I edit many messages after they are posted, and the changes will not show up in emails.

— Reply to this email directly or view it on GitHubhttps://github.com/zetaweb/www.gittip.com/issues/449#issuecomment-12729910.

carsomyr commented 11 years ago

@MikeFair I'd be happy to explain my work, as I am not even sure it's correct. Let's use my earlier 4x4 payout matrix p and purse vector d. Are you clear with the representation? The rows represent payouts to other entities, including oneself.

Do you agree that (d' * p)' gives the payouts after the first round of tipping? You can try this in GNU Octave for yourself. I'm going to provide the description piecemeal: Nod if you are convinced of this, or shake your head if you want help setting up the calculation in Octave, or need some part clarified.

MikeFair commented 11 years ago

Am I misunderstanding something about the tipping policy?

I think so, what we're currently calling tips isn't actual dollar amounts, they're shares. So that's 8000/10000 shares which is 80% not $8000.

In the end at the final step, these percentages get converted into dollar amounts by multiplying by the amount of money the tipper put into the pot.

Everything starts with P1, which is representing 100%. P1 - F1 20%
- F2 10%
- P1 33.33%
- P2 33.33%
- other 33.33%
- other 90%
- other 80%

So following just the one path between P1 and P2 we get: 20% of P1 -> 10% of F1 -> 33.33% of F2 = 0.6666% of P1 0.2 * 0.1 * 0.3333 = 0.006666

This is saying that on pay day: P1 allocates to P2 0.6666%

If P1 were to tip $100.00, 0.6666% of that is $0.66. So P1 ends up tipping P2 66 cents.

The examples you showed seemed to be confusing the share amounts within each fund and the percentages they are representing. Also, the funds don't have any money. The sum of the shares is not the total amount they have to give, it's the denominator basis for calculating the percentage.

The example is only dealing with 1 entity giving any money, and that's P1 who is tipping $100. The procedure is working out how much of that $100.00 is going to go where.

It does that by first normalizing everything as a percentage in P1 terms, and then second multiplying those percentages by the amount P1 gave. At the end of the second phase what we have is a direct list of exactly the dollar amounts P1 will tip and to whom.

Does this match with your understanding of the process? Not seeing that the amounts within each fund are shares that are only meaningful in the context of their peers/siblings could really confuse the issue...

carsomyr commented 11 years ago

@MikeFair I'm sorry for the confusion. Now that I understand your description, I will adjust my calculations accordingly. As far as I can tell, though, your schema can be (de?)normalized further. Injecting share quantities into calculations is potentially misleading, and it certainly misled me. Since the denominator of each share quantity accompanies the numerator in every step of the calculation, why bother with these quantities at all? We're trying to work through the math, and it's jarring to see fractions like these.

Also, it wasn't explicitly stated that P1 started with $100 to allocate. In fact, as my matrix calculations suggest, changing the initial allocations changes the result. Can we also talk in dollar terms without converting pennies to dollars? That's another detail that doesn't need to be displayed for the purposes of grinding through the math.

carsomyr commented 11 years ago

@MikeFair Alright, I've formalized all of this into some Octave code.

#!/usr/bin/env octave

function [res] = payouts(payout, initial, n_rounds)

payout_d = diag(diag(payout));

if nargin < 3
    res = initial' \
        * inverse(eye(size(payout, 1)) - (payout - payout_d)) \
        * payout_d;
else
    acc1 = zeros(size(payout, 1));
    acc2 = eye(size(payout, 1));

    for _ = 1:n_rounds,
        acc1 += acc2;
        acc2 *= (payout - payout_d);
    end

    acc1 = acc1 * payout_d + acc2;

    res = initial' * acc1;
end

end

payout = [.8 0 .2 0;
          0 1 0 0;
          0 0 .9 .1;
          1/3 1/3 0 1/3];

initial = [10000 0 0 0]';

payouts(payout, initial, 3)

Do try to run this code. I believe it encodes the tipping process. The payouts function takes three arguments.

  1. payout, the payout matrix.
  2. initial, the initial allocations.
  3. n_rounds, the number of tipping rounds.

The third parameter is optional; that is, if you leave it out, payouts will compute an infinite number of tipping rounds. Try to play around with it and let me know if you think the results are reasonable.

MikeFair commented 11 years ago

@carsomyr No worries, I'm glad we're on the same page now.

I'm happy to talk about everything in percentage terms for now if that's useful for you.

I just finished the work I was doing on the DevNet login system which I needed to do before I could start work on the fund testing. I now have a way to actually create these funds inside GitTip and will be begining work on the pay day calculation query.

For what it's worth, I know that the recursive query I presented is going to work almost exactly as I've decribed it and produce the right answers. But it's nice to have a second source of answers calculated using a different method for reconciliation purposes.

So I think it'd be good for everyone if we both completed our paths of development effort and then compared our results given the same input. If both of us can come up with the same answers for what should happen on Pay Day everyone is going to rest that much easier that the algorithms used are accurate.

carsomyr commented 11 years ago

@MikeFair A word on the code and what it's encoding; the rule is: Whenever an entity receives incoming tips, it distributes those tips on a percentage (or shares, if you will) basis to tippees. Now with some code in place, the allocation at round 1 is:

8000
0
2000
0

P1 just tipped F1 2000, or 20% of its purse. The allocation at round 2 is:

8000
0
1800
200

F1 just tipped P2 200, or 10% of its purse. The allocation at round 3 is:

8066.66
66.66
1800
66.66

F2 just tipped P1, P2, and P3 66.66 each, or 33.33% of its purse. The allocation at round infinity is:

8053.691
67.114
1812.081
67.114

Pretty weird result, eh? Note that we don't actually compute an infinite number of rounds. You can verify all of the above by modifying the Octave code (leave out the third parameter for the infinity case). Instead, we calculate it implicitly through some mathematical properties of matrices. Hope this clarifies things. Do you think the tipping description faithfully captures what we're after?

MikeFair commented 11 years ago

@carsomyr You're going to have to help me understand that code. I don't read Matlab/Octave (at least not yet) :) I also really don't think in terms of matrix operations so even thinking about what it means to multiply a matrix times an inversion of itself, times another matrix, getting the various properties like StdDev are difficult for me.

I can generally follow some laid out pictures of the matrices themselves like you were doing initially. Especially as they came with the descriptions you were providing, but once it got into the operations it started moving too fast for me to keep up, especially since I was working on an orthognal axis. ;)

Let's keep going though. I'm writing up the SQL version now and have been building exactly the P1/F1/F2/P1,P2 example we've been discussing. :) (Unfortunately the GitTip UI prevents me from actually putting in the exact amounts we've been working with, but I think it'll be close enough.)

chadwhitacre commented 11 years ago

I love what you two are doing here. I need to catch up on this thread ...

carsomyr commented 11 years ago

@whit537 You are welcome to stop by the Beauty Shoppe and have lunch nearby with Clark and me. After one whiteboard session, I'll make a linear algebra convert out of you. In the meantime, try working through my last example with the one, two, three, and infinite tipping iterations. The math isn't so important, as it is merely a tool for getting what we want. The main question is whether the tipping process matches what intuitively is supposed to happen.

chadwhitacre commented 11 years ago

@carsomyr Thank you! I'd love to get together and go over this with you. My email is chad@zetaweb.com and my cell is 412-925-4220, or we can schedule something here. :)

MikeFair commented 11 years ago

@carsomyr The biggest tell tale sign you did it right is by adding your results up and getting the original 10000 shares from P1. Your walk through was great, thanks. It really helped me get what you were doing to check it out quickly.

Those results are exactly what I'd expect them to be except for one thing. The numbers are accurate, but the 8053.691 and 1812.081 are both supposed to be for P3 (aka Other Places)

So the final result ought to have been: P3: 9865.772 P2: 67.114 P1: 67.114

and the great news is that your results are the almost exactly the same as the final results from the initial example I posted: P1, P1, 0.006666 * 10000 = 66.66 P1, P2, 0.006666 * 10000 = 66.66 P1, , 0.986666 \ 10000 = 9866.66

The difference in our numbers stems from your code iterating through the P1->F2->P1 cycle multiple times and mine not. I sacrificed that for the operational efficiency of refusing to traverse the cycle because as you implied, we have to stop somewhere.

MikeFair commented 11 years ago

Yes, you've now got the spirit of the funds as described right. With the one exception of detecting that the P3/Other from P1 is the same as the P3/Other in F1, and F2. But I'm sure you'll be able to work that out..

Now that you understand and can code the case, initial three questions remain: 1) Choosing how to go about addressing the cycles 2) People don't give $100.00 each pay day, they give much much less 3) We can't move fractional pennies and that's going to create an accumulation somewhere

Each of these point to the same thing, the need to track factional pennies across the whole system and to track the amount for each recipient from the system as an aggregate whole before moving the amounts.

But I'll let you think on it and see what you come up with.

MikeFair commented 11 years ago

And now for my code! :)

I ran this from my local GitTip install. I created accounts in the front end, set tips, and then ran the SQL Query that follows.

I wasn't able to replicate our example exactly yet, I can do that tomorrow. The example I used tried to approximate it though: P1 -> F1: 6.00 P1 -> Other: 24.00

F1 -> F2: 3.00 F1 -> Other: 12.00

F2 -> P1: 3.00 F2 -> P2: 3.00 F2 -> Other: 3.00

P1 puts in a total of $30.00

First I'll show the final results: "P1" -> "Other": 0.97333% ($29.20) "P1" -> "P1": 0.01333% (0.40) "P1" -> "P2": 0.01333% (0.40)

Here's the SQL Code. It requires some DB modifications; three views and a new is_fund column I'll go through it in another post tomorrow (or soon thereafter):

WITH RECURSIVE payout_weights(funder, total, is_fund, tipper, tippee, percent , depth, path, cycle) AS ( SELECT t.tipper, t.total, t.is_fund, t.tipper, t.tippee, t.percent , 1, ARRAY[t.tipper], false FROM fundweights t WHERE t.amount > 0

  UNION ALL

    SELECT p.funder, p.total, t.is_fund, t.tipper, t.tippee, t.percent * p.percent
            , p.depth + 1
            , path || t.tipper
            , t.tippee = ANY(path)
    FROM fundweights t, payout_weights p
    WHERE t.tipper = p.tippee
          AND t.amount >= 0
          AND NOT cycle

), payday(funder, total, path, is_fund, tippee, percent) AS ( SELECT pw.funder, pw.total, pw.path, p.is_fund , pw.tippee, SUM(pw.percent) FROM payout_weights pw inner join participants p on p.id = pw.tippee WHERE not p.is_fund GROUP BY pw.funder, pw.total, pw.path, p.is_fund, pw.tippee ) SELECT p.funder, p.total, p.tippee--, p.path, p.is_fund , SUM(round(p.percent, 5)) percent, SUM(round(p.percent * p.total, 5)) dollars FROM payday p WHERE p.funder = 'P1' GROUP BY funder, total, tippee ORDER BY tippee;

carsomyr commented 11 years ago

@MikeFair The problem with cutting out cycles early is that you can get different results based on which order you recursively process the payout graph. As for operational efficiency, why not move to a matrix model immediately? That would implicitly deal with the cycle issue and prevent bugs (notice that the infinite iterations case is one line of Octave, and probably one line of Python without the code to convert SQL tables to sparse arrays). There's also the issue of speed. The SQL is processing table rows as linked structures, while sparse arrays are vanilla arrays underneath.

If I write the NumPy calculations, are you able to provide me with a schema for how the payouts table looks like, when it is freshly read from the database?

Also, I don't understand the purpose of "other" I mistook it for "self", and hence. Can we treat "other" as just another entity? If not, why does it need to be modeled separately?

MikeFair commented 11 years ago

@carsomyr Good call on the 'infinite iterations' line, I read it, but I didn't get it. So thanks for pointing that out. Yes, you can treat Other as just another entity, but specifically it's the same entity at multiple points in the hierarchy.

I actually think it's a great idea to go straight to matrices! Let's make it as its own parallel project (likely on zetaweb's hub) to ultimately take over for payday.py.

The back end pay day processing has nothing to do with the gittip.com UI except for that they are operating on the same set of objects. Creating a Python project using NumPy to take over for that part, and taking responsibility for the delivery the data and reconciling with the back end processor is a really good thing.

As for the data structure for NumPy to receive, let's do it the other way around, you tell us/me what presentation you'd like to see/need from the database and that makes your job easy and we'll create a view to present that.

That keeps us free to change the internal database representation whenever we want so long as we maintain the View contract your procedure is expecting.

I can tell you the view I've already created, "fundweights", has the following columns: from text , to text , amount numeric(35,2) , total numeric , percent numeric

A change I'd recommend for you would be to recreate from/to as ints. Do you think it would make a difference if we made sure that the ints were contiguous so that they could be used directly as their index into the matrix or are sparse matrices good enough to handle that?

carsomyr commented 11 years ago

@MikeFair I think the best way would be to put in some initial implementation of funds. I will hack out a computation module where the magic happens and then create a SQL-to-sparse array adapter that handles the transformation out of and back into SQL. What should the output be?

MikeFair commented 11 years ago

One thing you said earlier about users reserving allocations for themselves struck my attention though, and it reminded me to mention that user's will have a way to handle reserving money for themselves.

A participant gets two private funds named to_me and from_me.

This represenents what to do with money coming "to_me", which defaults 100% to their bank account, and "from_me" which represents how to direct the money they put in.

Whenever someone puts a participant in a fund, what that actually means is allocate to the participant's "to_me" fund.

A user that wishes solely to regift tips, can put an entry in their "to_me" fund directing 100% of it to their "from_me" fund.

This normalizes all transfers through a fund which is like a router for money instead of IP packets. ;)


Back on the ouput of the Computation module you're considering, my expectation is that the output of this module at it's most complicated would be one row per participant of 6 fields each:

, , "balance", , "credit", ... Expect to be given a batch_event_id, the current set of participant balances (to be applied to their from_me fund), and the full construction of funds at the time the process is kicked off. I expect this to be fed through a JSON file or a native binary file NumPy can work with named for the batch_event_id. The JSON would likely be something like: [ { "user": "UserName1", "balance": , "fund": {"FundId1": , "FundId2": }} , { "user": "UserName2", "balance": , "fund": {"FundId1": , "FundId2": }} , { "user": "UserName3", "balance": , "fund": {"FundId1": , "FundId2": }} ... ] For background, I'm just starting with replicating exactly what payday.py curently does: From payday.py: Exchanges (moving money between Gittip and the outside world) and transfers (moving money amongst Gittip users) ... exchanges and transfers happen transactionally. These exchanges and transfers accrue against a "pending" column in the database. Once the payday event has completed successfully, it ends with the pending column being applied to the balance column and reset to NULL in a single transaction. There's two fields on a participant, balance and pending. First we populate pending, then we commit it to balances. If your matrix is able to handle what I think it can, then we could/should process the entire universe in one go. If we can do that, then it might be possible just to take new balances as the output, making the output: , , For even more color, I see this whole process ultimately happening in three independent phases: phase 1) Collect funds. Regardless of whatever the funder wanted to give, we can only deliver whatever has actually arrived. This accounts for misconfigurations, transfer failures, and other errors. Collected funds will get added to the participant's balance. At the end of this phase, all user's pay-in amounts will be populated in their balance field. phase 2) Matrix crunch. (The part we're discussing now.) We allocate whatever is in the user's balance field according to the user's delivery fund. At the end of this phase, either all user's pending and balance fields will reflect the intra-GitTip transfers, or none will. phase 3) Process payouts. This phase executes exchanges (transfers to bank accounts). At the end of this phase balances will have been processed.