akoshelev commented 3 months ago

This is a tracking issue to understand why the impact is so significant. For 50M runs with the current event generator, aggregating all trigger values sums up to 400k (2^19), so we really use just 4 more bits and we shouldn't see performance regressions of that scale

akoshelev commented 3 months ago

1M run (16 bits): https://draft-mpc.vercel.app/query/view/meet-alarm2024-06-18T2026. 10m 28s. 1M run (32 bits): https://draft-mpc.vercel.app/query/view/vocal-wet2024-06-18T2043. 10m 23s

as expected it does not have any impact on small runs

andyleiserson commented 3 months ago

Here's the calculation of the number of multiplies (which I think is also indicative of the circuit depth): https://docs.google.com/spreadsheets/d/1A55vv7TaIVMP7Nf9ZxZqzkKGjFun2AA2N-lMzb6Kga4/edit#gid=0

I'm wondering if it's something like the number of tokio polls or context switches scaling with the number of outputs bits, even if the amount or work to do doesn't.

akoshelev commented 3 months ago

20M run (16 bits): https://draft-mpc.vercel.app/query/view/tried-goth2024-06-18T2054. 11376.323407524s 20M run (32 bits): https://draft-mpc.vercel.app/query/view/brute-joint2024-06-19T0032. 11196.288208378s

akoshelev commented 3 months ago

Here's the calculation of the number of multiplies (which I think is also indicative of the circuit depth): https://docs.google.com/spreadsheets/d/1A55vv7TaIVMP7Nf9ZxZqzkKGjFun2AA2N-lMzb6Kga4/edit#gid=0

I'm wondering if it's something like the number of tokio polls or context switches scaling with the number of outputs bits, even if the amount or work to do doesn't.

akoshelev commented 3 months ago

ok, so for 20M events there is actually no difference in performance. Histogram values were hovering around 418896

akoshelev commented 3 months ago

I am going to test 40M now

40M run (16 bits): https://draft-mpc.vercel.app/query/view/macro-voter2024-06-19T0559 (22629.633063063s) 49M run (32 bits): https://draft-mpc.vercel.app/query/view/sole-calm2024-06-19T1620 (22129.490560925s)

Ok, it seems there is no impact on 40M. Maybe I misinterpret the results, going to run with 50M and then 100M

akoshelev commented 3 months ago

50M run (16 bits): https://draft-mpc.vercel.app/query/view/dizzy-mop2024-06-20T1718 (28216.537024598s) 50M run (32 bits): https://draft-mpc.vercel.app/query/view/weeny-entry2024-06-21T1840 (28787.122625792s)

akoshelev commented 3 months ago

100M run (16 bits): https://draft-mpc.vercel.app/query/view/vast-rink2024-06-22T0528 (does not work because of #1141 )

akoshelev commented 3 months ago

I want to push fixes for #1141 first and then validate that there is no impact on 100M rows before closing this

akoshelev commented 2 months ago

black-baton2024-07-10T1650

akoshelev commented 2 months ago

100M run (16 bits): https://draft-mpc.vercel.app/query/view/black-baton2024-07-10T1650 (50305.268764835s) 100M run (32 bits): https://draft-mpc.vercel.app/query/view/exact-malt2024-07-27T0702 (47668)

akoshelev commented 2 months ago

No regression is detected, closing this issue

private-attribution / ipa

Increasing number of bits in histogram values from 16 to 32 increases IPA latency by as much as 50% #1151

1141 is fixed now, doing another 100M run: https://draft-mpc.vercel.app/query/view/black-baton2024-07-10T1650