Closed ankurdave closed 10 months ago
Impressive work @ankurdave. I've never had the stomach for understand the matching logic.
Also, please paste the profiler call table for posterity :)
Wrong button. Meant to merge, not close.
Before:
20664200 function calls in 11.972 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1017600 2.767 0.000 4.268 0.000 matching.py:215(_entry_and_posting_ids_key)
84800 1.272 0.000 9.529 0.000 matching.py:283(add_posting)
932800 1.229 0.000 6.268 0.000 matching.py:272(_search_posting)
3924816 1.004 0.000 1.368 0.000 matching.py:216(<genexpr>)
1017600 0.895 0.000 0.895 0.000 matching.py:256(get_fuzzy_date_range)
5069708 0.622 0.000 0.622 0.000 {built-in method builtins.id}
476638 0.493 0.000 0.997 0.000 matching.py:871(posting_set_id)
1130506 0.432 0.000 0.673 0.000 matching.py:196(get_posting_weight)
1017600 0.397 0.000 1.292 0.000 matching.py:261(fuzz_date_currency_key)
1607694 0.380 0.000 0.503 0.000 matching.py:872(<genexpr>)
57773 0.357 0.000 1.633 0.000 matching.py:884(add_subset)
8944 0.277 0.000 0.277 0.000 {built-in method builtins.max}
1263068 0.267 0.000 0.267 0.000 {built-in method __new__ of type object at 0x10e96ffa8}
12181 0.134 0.000 2.426 0.000 __init__.py:731(wrapper)
1018444 0.119 0.000 0.119 0.000 {method 'append' of 'list' objects}
91051 0.089 0.000 0.164 0.000 amount.py:42(__new__)
96981 0.086 0.000 2.653 0.000 matching.py:930(get_matchable_postings)
9968 0.078 0.000 0.078 0.000 {method 'items' of 'collections.OrderedDict' objects}
12181 0.077 0.000 12.370 0.001 matching.py:375(add_transaction)
57773 0.075 0.000 0.129 0.000 {built-in method builtins.sum}
84800 0.074 0.000 0.202 0.000 matching.py:265(_date_currency_key)
34304 0.066 0.000 0.178 0.000 booking_full.py:680(convert_costspec_to_cost)
9968 0.062 0.000 1.784 0.000 matching.py:856(_get_uncleared_aggregate_posting_candidates)
273197 0.054 0.000 0.054 0.000 matching.py:885(<genexpr>)
84800 0.048 0.000 0.075 0.000 matching.py:191(is_unknown_account)
12181 0.045 0.000 2.513 0.000 matching.py:812(get_aggregate_posting_candidates)
84800 0.041 0.000 0.082 0.000 matching.py:211(_date_key)
84800 0.034 0.000 0.054 0.000 <string>:1(<lambda>)
84800 0.034 0.000 0.041 0.000 posting_date.py:5(get_posting_date)
34304 0.033 0.000 0.085 0.000 __init__.py:452(_replace)
9968 0.033 0.000 0.433 0.000 __init__.py:72(__setitem__)
258325 0.033 0.000 0.033 0.000 {built-in method builtins.isinstance}
12181 0.027 0.000 0.092 0.000 matching.py:927(<listcomp>)
84793 0.027 0.000 0.027 0.000 {method 'startswith' of 'str' objects}
21125 0.023 0.000 0.054 0.000 __init__.py:179(__getitem__)
21125 0.023 0.000 0.028 0.000 __init__.py:66(__getitem__)
34304 0.020 0.000 0.052 0.000 __init__.py:442(_make)
8944 0.019 0.000 0.312 0.000 __init__.py:610(most_common)
24714 0.019 0.000 0.029 0.000 reconcile.py:460(is_posting_cleared)
97051 0.017 0.000 0.017 0.000 {method 'get' of 'dict' objects}
106943 0.015 0.000 0.015 0.000 {built-in method builtins.len}
9968 0.015 0.000 0.449 0.000 __init__.py:185(__setitem__)
8944 0.015 0.000 0.396 0.000 __init__.py:193(popitem)
12181 0.015 0.000 0.024 0.000 matching.py:856(<lambda>)
8944 0.015 0.000 0.069 0.000 __init__.py:111(pop)
8944 0.012 0.000 0.291 0.000 heapq.py:523(nlargest)
12181 0.010 0.000 0.111 0.000 matching.py:965(get_matchable_postings_from_transaction)
8944 0.010 0.000 0.012 0.000 __init__.py:88(__delitem__)
8944 0.010 0.000 0.032 0.000 __init__.py:189(__delitem__)
40287 0.010 0.000 0.010 0.000 matching.py:949(<genexpr>)
8944 0.010 0.000 0.010 0.000 __init__.py:728(__delitem__)
12181 0.008 0.000 0.101 0.000 matching.py:926(get_weighted_postings)
26017 0.007 0.000 0.009 0.000 matching.py:856(<genexpr>)
20101 0.006 0.000 0.006 0.000 __init__.py:93(__contains__)
9968 0.006 0.000 0.006 0.000 __init__.py:96(__missing__)
13836 0.006 0.000 0.006 0.000 {method 'setdefault' of 'collections.OrderedDict' objects}
11583 0.005 0.000 0.005 0.000 ofx.py:1528(is_posting_cleared)
9968 0.002 0.000 0.002 0.000 __init__.py:35(__setitem__)
8944 0.002 0.000 0.002 0.000 {built-in method builtins.iter}
8944 0.002 0.000 0.002 0.000 {method 'items' of 'dict' objects}
9968 0.002 0.000 0.002 0.000 __init__.py:138(getsizeof)
6328 0.002 0.000 0.002 0.000 {built-in method builtins.abs}
9968 0.001 0.000 0.001 0.000 __init__.py:601(__missing__)
8944 0.001 0.000 0.001 0.000 __init__.py:38(pop)
521 0.001 0.000 0.001 0.000 {method 'extend' of 'list' objects}
294 0.000 0.000 0.000 0.000 {built-in method builtins.hash}
300 0.000 0.000 0.001 0.000 {method 'add' of 'set' objects}
342 0.000 0.000 0.000 0.000 {built-in method builtins.min}
379 0.000 0.000 0.000 0.000 venmo_json_source.py:268(is_posting_cleared)
857 0.000 0.000 0.000 0.000 costco_receipt_source.py:237(is_posting_cleared)
20 0.000 0.000 0.000 0.000 matching.py:874(partition)
117 0.000 0.000 0.000 0.000 adp_payroll_source.py:195(is_posting_cleared)
294 0.000 0.000 0.000 0.000 amount.py:101(__hash__)
115 0.000 0.000 0.000 0.000 cashapp_csv_source.py:209(is_posting_cleared)
269 0.000 0.000 0.000 0.000 matching.py:907(<lambda>)
37 0.000 0.000 0.000 0.000 workday_payroll_source.py:211(is_posting_cleared)
14 0.000 0.000 0.000 0.000 amount.py:149(mul)
44 0.000 0.000 0.000 0.000 boa_mortgage_csv_source.py:175(is_posting_cleared)
3 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}
8 0.000 0.000 0.000 0.000 amount.py:83(__eq__)
1 0.000 0.000 0.000 0.000 journal_editor.py:296(all_entries)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
After:
9725404 function calls in 5.913 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
84800 0.897 0.000 3.127 0.000 matching.py:283(add_posting)
1017600 0.766 0.000 0.766 0.000 matching.py:256(get_fuzzy_date_range)
476638 0.670 0.000 1.164 0.000 matching.py:871(posting_set_id)
1607694 0.372 0.000 0.494 0.000 matching.py:872(<genexpr>)
1017600 0.369 0.000 1.134 0.000 matching.py:261(fuzz_date_currency_key)
57773 0.347 0.000 1.784 0.000 matching.py:884(add_subset)
84800 0.347 0.000 0.482 0.000 matching.py:215(_entry_and_posting_ids_key)
8944 0.254 0.000 0.254 0.000 {built-in method builtins.max}
1471960 0.169 0.000 0.169 0.000 {built-in method builtins.id}
21125 0.166 0.000 0.193 0.000 __init__.py:179(__getitem__)
1018444 0.112 0.000 0.112 0.000 {method 'append' of 'list' objects}
282506 0.109 0.000 0.185 0.000 matching.py:196(get_posting_weight)
84800 0.103 0.000 0.198 0.000 matching.py:272(_search_posting)
327068 0.090 0.000 0.123 0.000 matching.py:216(<genexpr>)
343628 0.079 0.000 0.079 0.000 {built-in method __new__ of type object at 0x1008c6fa8}
96981 0.077 0.000 2.703 0.000 matching.py:930(get_matchable_postings)
57773 0.072 0.000 0.125 0.000 {built-in method builtins.sum}
84800 0.070 0.000 0.193 0.000 matching.py:265(_date_currency_key)
12181 0.065 0.000 5.999 0.000 matching.py:376(add_transaction)
67231 0.064 0.000 0.117 0.000 amount.py:42(__new__)
9968 0.057 0.000 1.853 0.000 matching.py:856(_get_uncleared_aggregate_posting_candidates)
273197 0.054 0.000 0.054 0.000 matching.py:885(<genexpr>)
84800 0.042 0.000 0.064 0.000 matching.py:191(is_unknown_account)
12181 0.040 0.000 2.576 0.000 matching.py:812(get_aggregate_posting_candidates)
84800 0.038 0.000 0.079 0.000 matching.py:211(_date_key)
12181 0.033 0.000 2.497 0.000 __init__.py:731(wrapper)
84800 0.033 0.000 0.040 0.000 posting_date.py:5(get_posting_date)
84800 0.033 0.000 0.050 0.000 <string>:1(<lambda>)
9968 0.030 0.000 0.393 0.000 __init__.py:72(__setitem__)
12181 0.025 0.000 0.086 0.000 matching.py:927(<listcomp>)
186855 0.023 0.000 0.023 0.000 {built-in method builtins.isinstance}
84793 0.022 0.000 0.022 0.000 {method 'startswith' of 'str' objects}
10494 0.021 0.000 0.058 0.000 booking_full.py:680(convert_costspec_to_cost)
21125 0.019 0.000 0.025 0.000 __init__.py:66(__getitem__)
24714 0.018 0.000 0.028 0.000 reconcile.py:460(is_posting_cleared)
12181 0.017 0.000 0.027 0.000 matching.py:856(<lambda>)
8944 0.017 0.000 0.286 0.000 __init__.py:610(most_common)
97051 0.017 0.000 0.017 0.000 {method 'get' of 'dict' objects}
9968 0.014 0.000 0.408 0.000 __init__.py:185(__setitem__)
8944 0.013 0.000 0.061 0.000 __init__.py:111(pop)
8944 0.012 0.000 0.359 0.000 __init__.py:193(popitem)
8944 0.012 0.000 0.268 0.000 heapq.py:523(nlargest)
10494 0.012 0.000 0.029 0.000 __init__.py:452(_replace)
83133 0.011 0.000 0.011 0.000 {built-in method builtins.len}
12181 0.010 0.000 0.103 0.000 matching.py:965(get_matchable_postings_from_transaction)
40287 0.009 0.000 0.009 0.000 matching.py:949(<genexpr>)
8944 0.009 0.000 0.010 0.000 __init__.py:88(__delitem__)
8944 0.009 0.000 0.009 0.000 __init__.py:728(__delitem__)
8944 0.008 0.000 0.027 0.000 __init__.py:189(__delitem__)
26017 0.008 0.000 0.010 0.000 matching.py:856(<genexpr>)
12181 0.007 0.000 0.093 0.000 matching.py:926(get_weighted_postings)
10494 0.007 0.000 0.017 0.000 __init__.py:442(_make)
9968 0.006 0.000 0.006 0.000 __init__.py:96(__missing__)
20101 0.006 0.000 0.006 0.000 __init__.py:93(__contains__)
13836 0.005 0.000 0.005 0.000 {method 'setdefault' of 'collections.OrderedDict' objects}
11583 0.005 0.000 0.005 0.000 ofx.py:1528(is_posting_cleared)
9968 0.002 0.000 0.002 0.000 __init__.py:35(__setitem__)
9968 0.002 0.000 0.002 0.000 {method 'items' of 'collections.OrderedDict' objects}
8944 0.002 0.000 0.002 0.000 {built-in method builtins.iter}
9968 0.002 0.000 0.002 0.000 __init__.py:138(getsizeof)
8944 0.002 0.000 0.002 0.000 {method 'items' of 'dict' objects}
8944 0.002 0.000 0.002 0.000 __init__.py:38(pop)
9968 0.001 0.000 0.001 0.000 __init__.py:601(__missing__)
521 0.001 0.000 0.001 0.000 {method 'extend' of 'list' objects}
1808 0.000 0.000 0.000 0.000 {built-in method builtins.abs}
294 0.000 0.000 0.000 0.000 {built-in method builtins.hash}
300 0.000 0.000 0.001 0.000 {method 'add' of 'set' objects}
342 0.000 0.000 0.000 0.000 {built-in method builtins.min}
857 0.000 0.000 0.000 0.000 costco_receipt_source.py:237(is_posting_cleared)
379 0.000 0.000 0.000 0.000 venmo_json_source.py:268(is_posting_cleared)
294 0.000 0.000 0.000 0.000 amount.py:101(__hash__)
20 0.000 0.000 0.000 0.000 matching.py:874(partition)
117 0.000 0.000 0.000 0.000 adp_payroll_source.py:195(is_posting_cleared)
269 0.000 0.000 0.000 0.000 matching.py:907(<lambda>)
115 0.000 0.000 0.000 0.000 cashapp_csv_source.py:209(is_posting_cleared)
37 0.000 0.000 0.000 0.000 workday_payroll_source.py:211(is_posting_cleared)
4 0.000 0.000 0.000 0.000 amount.py:149(mul)
44 0.000 0.000 0.000 0.000 boa_mortgage_csv_source.py:175(is_posting_cleared)
3 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}
8 0.000 0.000 0.000 0.000 amount.py:83(__eq__)
1 0.000 0.000 0.000 0.000 journal_editor.py:296(all_entries)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
On my 12,000-transaction ledger,
beancount_import.reconcile.LoadedReconciler._preprocess_entries()
currently takes 11.84 seconds. I measured and profiled it as follows:This PR contains the following micro-optimizations to
matching.py
that brought the time down to 5.98 seconds, a 2x speedup:_entry_and_posting_ids_key()
inadd_posting()
,remove_posting()
, and_get_matches()
._search_posting()
out of the fuzzy date loop inadd_posting()
andremove_posting()
.