Closed chadwhitacre closed 6 years ago
When I manually truncate the transaction_search.csv
to the one record and the gratipay
query also to the one record and step through the code carefully, I observe the desired behavior that it spits out one match.
But when I run over the full csv and query ...
FOUND IT IN BRAINTREE fghrjgg 7872 2015-11-26 12.67
number unmatched: 744
[gratipay] $
No FOUND IT IN GRATIPAY
.
How about with two entries?
Hrm ... two fails, and now one also fails if I run it straight through vs. set_trace
ing.
I'm not forcing it in the query this time. Is that what's different?
Yes.
Ooh! Ooh!
(Pdb) gratipay
{None: Record(exchange_id=79004, ref=None, date=datetime.date(2015, 11, 26), amount=Decimal('12.67'), participant_id=7872L)}
(Pdb)
I'm clobbering all null
ref
s down to a single key/value in the gratipay
dict
! π
π
FOUND IT IN BRAINTREE fghrjgg 7872 2015-11-26 12.67
FOUND IT IN GRATIPAY (7872, datetime.date(2015, 11, 26), Decimal('12.67'))
[gratipay] $
Eeeeee, now getting number unmatched: 2164
.
Let's go with a gist for versioning.
Oops! Still had the query constrained. π¬
Okay! Phew. Satisfied. Deep cleansing breath. They all match! π
Do the payment method tokens also match up?
(That's exchange_routes.address
.)
Not entirely. Okay! We're not trying to solve all of https://github.com/gratipay/gratipay.com/issues/4442 here. Do we have enough to get what we need yet?
What do we need?
We need to produce a csv of customer,amount
where the amounts sum to $36,428.04 and Gratipay's amount is $18,758.
I guess that means tracing the inputs through paydays and correlating them with outputs. We need to list the outputs that sum to the input.
I think we should go ahead and fill in the refs. You okay with that @kaguillera? I will test it out and spot-check it on a backup.
go ahead and fill in the refs
... because that will be the best way to identify the input set.
As far as tracing the output set, I guess we will have to use transfers
and payments
and look at timestamps?
Okay!
https://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-query-using-postgresql
update exchanges e
set ref = tmp.ref
from (values ('399g55w', 78775)
-- ...
, ('nkv6gqw', 78568)
) as tmp(ref, exchange_id)
where tmp.exchange_id = e.id
;
π
gratipay-bak=# \i backfill-braintree-2015.sql
UPDATE 744
gratipay-bak=#
Rerunning match-2015.py
is now a no-op. π
Okay! Refs backfilled locally! Now to trace payments ...
Working up a query to select the inputs to trace.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function, unicode_literals
from gratipay import wireup
db = wireup.db(wireup.env())
inputs = db.all('''
SELECT count(*), sum(amount + fee)
FROM exchanges e
JOIN exchange_routes er
ON e.route = er.id
WHERE network = 'braintree-cc'
AND "timestamp"::text >= '2015-01-01'
AND "timestamp"::text < '2016-01-01'
AND amount > 0
''')
print(inputs)
[Record(count=7996L, sum=Decimal('136429.97'))]
Need to match ... 2,165 and $36,428.
Constraining to status='succeeded'
:
(7704, Decimal('133386.20'))
π€
Well, I guess we have the ref
s of all 2,165 transactions that Braintree is counting. What's the pattern difference between those and the extra 5,539 that my query is counting?
P.S. If this rabbit hole is too deep we could punt and just query on ref
in the known set from Braintree. Would feel more confident understanding this, though ...
I think I'm going to add a from_braintree
column to my local exchanges
table and go from there.
gratipay-bak=# \i refs.sql
ALTER TABLE
UPDATE 2165
gratipay-bak=#
alter table exchanges add column from_braintree boolean not null default false;
update exchanges
set from_braintree = true
where ref in ( 'jyb8j8g'
-- ...
, '7jkhp32'
)
;
Golly. My query is only finding 1,687 items from_braintree
!
Null routes?
No. Phew. π
gratipay-bak=# select count(*) from exchanges where from_braintree and route is null;
βββββββββ
β count β
βββββββββ€
β 0 β
βββββββββ
(1 row)
gratipay-bak=#
Double- and triple-phew. π
gratipay-bak=# select count(*) from exchanges where from_braintree;
βββββββββ
β count β
βββββββββ€
β 2165 β
βββββββββ
(1 row)
gratipay-bak=# select count(*) from exchanges where ref is not null and route is null;
βββββββββ
β count β
βββββββββ€
β 0 β
βββββββββ
(1 row)
gratipay-bak=#
Oh! !m us :D
gratipay-bak=# select count(*) from exchanges where route is null;
βββββββββ
β count β
βββββββββ€
β 0 β
βββββββββ
(1 row)
gratipay-bak=#
π
gratipay-bak=# \d exchanges
Table "public.exchanges"
ββββββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Column β Type β Modifiers β
ββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
...
β route β bigint β not null β
...
Convenience view ...
gratipay-bak=# create view ewr as (select e.*, er.network, er.address from exchanges e join exchange_routes er on e.route = er.id);
Yeah, thought so.
gratipay-bak=# select count(*) from ewr where from_braintree and network != 'braintree-cc';
βββββββββ
β count β
βββββββββ€
β 478 β
βββββββββ
(1 row)
π€
gratipay-bak=# select count(*) from ewr where from_braintree and network = 'unknown';
βββββββββ
β count β
βββββββββ€
β 416 β
βββββββββ
(1 row)
Oh sweet mercy. π
gratipay-bak=# select network, count(network) from ewr where from_braintree group by network order by count desc;
ββββββββββββββββ¬ββββββββ
β network β count β
ββββββββββββββββΌββββββββ€
β braintree-cc β 1687 β
β unknown β 416 β
β paypal β 31 β
β balanced-cc β 31 β
ββββββββββββββββ΄ββββββββ
(4 rows)
Something in the matcher script?
Or did we link the wrong routes at some point under gratipay/gratipay.com#3912?
Looks like 11 routes are implicated:
gratipay-bak=# select route, count(route) from ewr where from_braintree and not network in ('braintree-cc', 'unknown') group by route order by count desc;
βββββββββ¬ββββββββ
β route β count β
βββββββββΌββββββββ€
β 2554 β 27 β
β 12629 β 15 β
β 12216 β 8 β
β 11403 β 3 β
β 5144 β 3 β
β 10402 β 1 β
β 12440 β 1 β
β 2715 β 1 β
β 11426 β 1 β
β 11584 β 1 β
β 12565 β 1 β
βββββββββ΄ββββββββ
(11 rows)
gratipay-bak=#
Alright, how do we avoid caring about this?
I guess we can just start with the from_braintree
in the queryβI don't actually have to run anything against production here, I can do it all from this here backup here.
gratipay-bak=# select sum(amount+fee), count(*) from inputs;
ββββββββββββ¬ββββββββ
β sum β count β
ββββββββββββΌββββββββ€
β 36428.04 β 2165 β
ββββββββββββ΄ββββββββ
(1 row)
DROP VIEW ewr;
CREATE VIEW inputs AS (SELECT * FROM exchanges WHERE from_braintree ORDER BY "timestamp" ASC)
There are only a handful of transfers
after we started charging on Braintree (https://github.com/gratipay/inside.gratipay.com/issues/1164#issuecomment-330670984). I am going to try ignoring them and only look at payments
.
gratipay-bak=# select "timestamp"::date, amount from transfers where "timestamp" >= '05-28-2015'::date order by "timestamp" desc;
ββββββββββββββ¬βββββββββ
β timestamp β amount β
ββββββββββββββΌβββββββββ€
β 2016-10-13 β 0.56 β
β 2016-08-24 β 3.00 β
β 2015-11-23 β 147.48 β
β 2015-09-13 β 0.43 β
β 2015-07-30 β 0.34 β
β 2015-07-30 β 0.87 β
ββββββββββββββ΄βββββββββ
(6 rows)
gratipay-bak=#
Alright, let's trace one exchange through payments
. I guess we want to trace it iteratively until it goes out again in another exchange or we hit 2016-01-01
. Do we have any regiving to speak of tho?
payments
start with Gratipay 2.0 on 2015-05-15
.
payments
are linked to a payday, but exchanges
are not.
First time for everything, friends. :)