Open dsernst opened 2 years ago
A temp fix, for recording this video, is to disable the shuffle proof code...
But we definitely need a better solution for real world elections.
The ~10s or so that it's taking isn't too bad... the problem is just that it's exceeding Vercel's serverless functions timeouts, thus making it impossible to unlock an election. My estimate is that given current profiling, and that the shuffle proof code scales linearly with the number of votes, any election with > 300 ciphertexts (e.g. 50 people voting on 6 things each) is in danger of running into this issue.
Temporarily disabling the shuffle proofs to be able to record this demo video... https://github.com/dsernst/siv/commit/424bb37bdd8bc557158eb4d7e6f2c30db03bd11d
Reverted the hotfix that was disabling the shuffle proofs: https://github.com/dsernst/siv/commit/f3cc6b15e2e4fd0c31ec78145d1c370db1380e22
This will unblock Verifying Observers proof confirmations for small elections.
This remains an open issue for elections w/ 300+ ciphertexts.
I think a good solution is to outsource all this longer-running cryptography to a dedicated server. We can set up a Heroku box to handle it, that can automatically go to sleep whenever it's not needed (the vast majority of the time), so we're not paying for unused hardware.
Here's that WIP branch: https://github.com/dsernst/siv/tree/demo-vid-script
Utah sample election timing out. Manually running it locally: unlocked 176 votes in 29364ms.
It has 5 items on the ballot. So that's 176 * 5 = 880 total votes being unlocked.
Firebase functions are another option? they have a 1 hour limit, 16gb memory, and can run npm libraries
https://github.com/dsernst/siv/commit/416ba4897522353ea71469b2c2def9d5cad85836 now skips generating shuffle proofs if there are no other verifying observers.
Unlocked 255 votes in 7485ms.
Old: 176 5 = 880 vote items, over 29364ms, or 29364 / 880 = 33.36ms/item New: 255 5 = 1,275 vote items, over 7485ms, or 7485 / 1275 = 5.87ms/item
5.68x faster
Initial stress tests (using new npx ts-node db-data/2023-04-22-simulate-rand-votes.ts
script):
Now with parallelized decryption, per column:
Test 2 is not faster as we were hoping. BUT! In retrospect that makes some sense because this is not actually parallelizing since it's all running on my laptop, it's still just a single next dev
server process.
Still on my local machine, but with more precise profiling:
init 0ms
check jwt 2467ms
preload db 0ms
election exists? 2505ms
election data 2ms
load votes, filter esig 8656ms
remove auth tokens 15ms
split 8ms
fastShuffle 5416ms
decrypt parallel 62387ms
store decrypted 4865ms
🔑 Unlocked 5104 votes with 4 columns (20416 ciphertexts) in 87,000ms. (4.26 ms/ciphertext)
First attempt on Vercel, unlocking failed, but it did print this out first:
init 0ms
check jwt 519ms
preload db 3ms
election exists? 1321ms
election data 2ms
load votes, filter esig 977ms
remove auth tokens 40ms
split 18ms
fastShuffle 17148ms
Best guess is it's failing because of 60 second timeout.
Success on Vercel with 2000 votes, parallelized:
init 0ms
check jwt 128ms
preload db 2ms
election exists? 280ms
election data 0ms
load votes, filter esig 735ms
remove auth tokens 21ms
split 14ms
fastShuffle 6906ms
decrypt parallel 24118ms
store decrypted 1540ms
🔑 Unlocked 2000 votes with 4 columns (8000 ciphertexts) in 33,839ms. (4.23 ms/ciphertext)
Non parallelized code, deployed on Vercel:
Parallelized code, deployed on Vercel:
init 0ms
check jwt 97ms
preload db 2ms
election exists? 277ms
election data 1ms
load votes, filter esig 21ms
remove auth tokens 1ms
split 0ms
fastShuffle 385ms
decrypt parallel 1407ms
store decrypted 174ms
🔑 Unlocked 100 votes with 4 columns (400 ciphertexts) in 2,460ms. (6.15 ms/ciphertext)
So parallelized was:
On Vercel: 3000 votes x 4 columns, parallelized:
init 0ms
check jwt 81ms
preload db 0ms
election exists? 315ms
election data 1ms
load votes, filter esig 663ms
remove auth tokens 15ms
split 3ms
fastShuffle 10004ms
decrypt parallel 36045ms
store decrypted 2198ms
🔑 Unlocked 3000 votes with 4 columns (12000 ciphertexts) in 49,397ms. (4.12 ms/ciphertext)
Ok, deployed parallelization code to main branch. Reran 3000 x 4 test to be sure:
init 0ms
check jwt 378ms
preload db 1ms
election exists? 759ms
election data 1ms
load votes, filter esig 641ms
remove auth tokens 8ms
split 3ms
fastShuffle 10149ms
decrypt parallel 35300ms
store decrypted 2056ms
🔑 Unlocked 3000 votes with 4 columns (12000 ciphertexts) in 49,398ms. (4.12 ms/ciphertext)
init 0ms
check jwt 79ms
preload db 1ms
election exists? 282ms
election data 1ms
load votes, filter esig 0ms
remove auth tokens 0ms
split 0ms
fastShuffle 150ms
decrypt parallel 1170ms
store decrypted 117ms
🔑 Unlocked 20 votes with 6 columns (120 ciphertexts) in 1,894ms. (15.78 ms/ciphertext)
init 0ms
check jwt 80ms
preload db 1ms
election exists? 271ms
election data 0ms
load votes, filter esig 753ms
remove auth tokens 34ms
split 2ms
fastShuffle 5208ms
decrypt parallel 13934ms
store decrypted 937ms
🔑 Unlocked 1000 votes with 6 columns (6000 ciphertexts) in 21,317ms. (3.55 ms/ciphertext)
init 0ms
check jwt 292ms
preload db 1ms
election exists? 543ms
election data 0ms
load votes, filter esig 1064ms
remove auth tokens 33ms
split 5ms
fastShuffle 14973ms
decrypt parallel 40585ms
Timed out at 60s, after decrypting but before reporting successful decryptions stored, but on refresh, all 3k votes were indeed successfully unlocked.
57496ms
. So not surprising it would time out during the next one which was ~3s for our last 3k vote trial.If (an election's only keyholder is admin@siv)
&& (admin@siv is already storing the decryption key in the db)
then:
admin@siv can "pre-decrypt" votes as the come in, with no adverse privacy implications
not publishing them anywhere, just keeping them in a private part of the db
then when election_admin hits "Unlock" btn, all the decryption is already done, greatly speeding things up
While making a demo video of election https://siv.org/admin/1645223145915/voters, we simulated an election w/:
Attempting to use the
Unlock 176 Votes
button was failing every time.The window would show an
alert()
with the message[Object object]
.Update: Added clearer error msg for Timeouts, and notify admin: https://github.com/dsernst/siv/commit/421933f228b1bcf8eb7f7bd242e62909bcf11045
Tracking down in the error in the Vercel > Functions > Error Logs showed that the problem was the
api/${election_id}/admin/unlock
endpoint was timing out at the 10s mark.So I added some profiling code (https://github.com/dsernst/siv/commit/2205849fcd94af2578d91934b77bc62ec4018be5) to this endpoint to see what was taking so long:
It looks like ~90% of the time is being spent generating & uploading the shuffle proofs.