securepollingsystem / securepollingoutline

outline of how the system works, the moving parts
4 stars 2 forks source link

feedback #2

Open hackergrrl opened 8 years ago

hackergrrl commented 8 years ago

Really cool idea! Some comments below on the overview doc. I also may be looking at this from a more peer-to-peer angle, though I think(?) you're aiming to have this be a centralized service? In any case, I'm erring on the side of minimizing trust between machines. Disclaimer: I am most certainly not a real crypto authority. :D

opinion = an exact text statement of up to 140 characters, expressing a belief, desire or opinion in the context of a voters' authority as a member of a democracy, for example: screed = a list of opinions a voter has chosen to express, with each opinion on its own line

Are opinions signed, or are sceeds? My understanding is that users publish opinions as they wish (each signed with their pubkey), and then screeds are aggregated by latter-mentioned spiders. Signing at the opinion level has the nice property that any subset of a user's screed can be verified independently.

signed screed = a voter's screed, signed using their private key, and posted publicly along with their public key (for verification of their screed) and the registrar's (likely detached) signature of their public key

What if "screed" simply meant "signed screed"? i.e. opinions only exist if they are signed. Otherwise they're just trustless data that has no value.

tally list = a list of opinions found by a tally spider, sorted in order of popularity, and presented with a percentage indicating how many signed screeds included it

How could the list be trusted? What if users instead ran a "tally list viewer" that just downloaded the list of signed opinions, verified all the signatures locally, and then displayed the computed results (desired percentages)? Since the crypto verification step would be happening on each user's local machine, the data would be more trust-worthy than some blob coming down from an external server.

screed editor = a computer program used by voters to edit their screed, and to sign it and upload it to their web page so it can be found by tally spiders

As per above: if signing happened on the opinion level, users wouldn't even need to have a full list of their opinions present. i.e. I can plug in my GNUK into a stranger's computer, write up an opinion, sign it, publish it, and walk away without needing the rest of my screed.

gives each voter total control over what they say and when they say it

How does it provide the "when"? Signing certainly lets them verifiably say things, but not when the signing happened.

allows voters to change their opinions and endorsements as often as they like

Nefarious tally spiders could misrepresent by using old user screeds that contain opinions that the user has since revoked! :)

anyone can count the number of people agreeing with each opinion, at any time and with independant verification anyone can use a tally spider (written in Golang) to curl all the signed screeds they can find on the net, and collate them by pubkey tally spider lists produced by spiders are shared publicly for people who don't want to run a tally spider for themselves.

Awesome properties. :D

voters use a program (in javascript, immobilized by hyperboot to prevent changes)

woo hyperboot!

signing causes numeric counter to increment, preventing fraud

How will you secure the counter bits from being spoofed/modified?

each screed record includes the time it was accessed

Why? Someone could modify the the timestamp in the file -- it can't be trusted.

an indication of whether both its signatures was verified

Again, you couldn't trust this: someone could just modify the file to say "yep, sure, all verified". Best to actually have anything that reads the tally file verify the screeds each time.

and an MD5 or SHA1 checksum of the record (so that pages which have not changed don’t have to be verified again if the MD5 matches).

Scary. You can't skip the verification step: I could modify the tally file to include false screeds and then generate each one's MD5/SHA1. Since you don't verify the signatures, I can just use bogus ones.

The record also includes the fingerprint of the voter’s public key

This step is unnecessary: a signed PGP message implicitly includes the fingerprint.

since this is how we watch for uniqueness of screed authors and merge screeds appearing with the same pubkey.

Overall, it's much more secure / simpler to just keep a list of all verbatim signed screeds, and verify each screed+signature on-demand each time you want to work with the data.


Awesome project! Can't wait to see more. :D

jerkey commented 8 years ago

Really cool idea! Some comments below on the overview doc. I also may be looking at this from a more peer-to-peer angle, though I think(?) you're aiming to have this be a centralized service? In any case, I'm erring on the side of minimizing trust between machines.

This is based on a simple trust model. You trust the registrar to properly vet participants, and you trust participants to keep their keys safe. But there are plenty of ways to monitor whether the registrar is behaving properly - for example if their (public) list of participants has false names on there, or if there are more valid-signed keys by them than their list accounts for. Similarly, if someone complains that they were denied a signature while eligible, it's cause for investigation.

As for the content, anyone can run a tally spider and verify all the signatures themselves, but most people will trust a tally spider run by an organization or by a friend with a server. Again the trust is easy to verify, since if any two tally spiders disagree there is cause for investigation, and any shenanigans will be newsworthy.

Another concern is DOS or partial DOS, where the server hosting the signed screeds is selectively censoring certain opinions. This also would be easy to detect, either by participants running a watchdog client on the availability of their screed, or by servers noticing screeds disappearing (they keep a cache of course). In such an event it would be easy to figure out what the censored screeds had in common.

opinion = an exact text statement of up to 140 characters, expressing a belief, desire or opinion in the context of a voters' authority as a member of a democracy, for example:
screed = a list of opinions a voter has chosen to express, with each opinion on its own line

Are opinions signed, or are sceeds? My understanding is that users publish opinions as they wish (each signed with their pubkey), and then screeds are aggregated by latter-mentioned spiders. Signing at the opinion level has the nice property that any subset of a user's screed can be verified independently.

opinions are single-line statements which make up a screed. Then the screed itself is signed. If a user wants to add or remove an opinion from their screed, they edit the screed and issue a new signature. This seems to be the most reasonable model considering the CPU load that tally spiders will face. The effort of a signature applying to a large text list is only insignificantly greater than a signature of a single line, so this is the most efficient path we can do.

signed screed = a voter's screed, signed using their private key, and posted publicly along with their public key (for verification of their screed) and the registrar's (likely detached) signature of their public key

What if "screed" simply meant "signed screed"? i.e. opinions only exist if they are signed. Otherwise they're just trustless data that has no value.

you're correct, a screed is useless without a signature. But it can exist, so I call it a screed. Once it's signed, I call it a signed screed.

tally list = a list of opinions found by a tally spider, sorted in order of popularity, and presented with a percentage indicating how many signed screeds included it

How could the list be trusted? What if users instead ran a "tally list viewer" that just downloaded the list of signed opinions, verified all the signatures locally, and then displayed the computed results (desired percentages)? Since the crypto verification step would be happening on each user's local machine, the data would be more trust-worthy than some blob coming down from an external server.

I am expecting that the work of verifying all these signatures will be more significant than what's appropriate for a personal computer such as a laptop or a phone, and it's a real advantage to have a 24/7 internet connection so you can watch for trends and stay updated. A tally-spider is really a job for a server. You can certainly run your own but most people will trust the one from the local newspaper, or the one run by google, or by their geek friend. Of course you can compare the results from multiple servers, and many people will in order to keep an eye out for mischief.

One advantage to this split is that simple clients can run on phones or laptops and simply query the server's database anytime. And the queries can be complex, such as "how many voters have one or more of the following statements" which is not an operation you could perform on your phone.

screed editor = a computer program used by voters to edit their screed, and to sign it and upload it to their web page so it can be found by tally spiders

As per above: if signing happened on the opinion level, users wouldn't even need to have a full list of their opinions present. i.e. I can plug in my GNUK into a stranger's computer, write up an opinion, sign it, publish it, and walk away without needing the rest of my screed.

You still need a place to upload your screed to, so you might as well download it and add the new opinion and sign it again.

gives each voter total control over what they say and when they say it

How does it provide the "when"? Signing certainly lets them verifiably say things, but not when the signing happened.

This is a good question. What I meant to say was that if a voter feels like participating in the democratic process at 3am on a Sunday, in their pyjamas with a bowl of popcorn, they can do just that. The fact of their scheduling has no bearing on whether their opinion as a Voter is taken seriously.

Compare this with the present reality that you must find your polling place on the first Tuesday of November every two or four years, and be in the mood to decide which district judge you like best - feel free to use your phone to get hints and good luck.

allows voters to change their opinions and endorsements as often as they like

Nefarious tally spiders could misrepresent by using old user screeds that contain opinions that the user has since revoked! :)

A nefarious tally-spider can straight-up lie. Its job is to report the results of tallying signed screeds, and you're trusting its data based on its reputation, as well as its host keys or however you identify it.

voters use a program (in javascript, immobilized by hyperboot to prevent changes)

woo hyperboot!

it's important to make this platform-independent so that nobody gets blocked from using it because they're running a weird computer, or a public computer, or a friends' phone.

signing causes numeric counter to increment, preventing fraud

How will you secure the counter bits from being spoofed/modified?

Physical security! But this is only a backup to the fact that people are counting how many signed pubkeys are found in the wild, versus how many names the Registrar's (public) roster has.

each screed record includes the time it was accessed

Why? Someone could modify the the timestamp in the file -- it can't be trusted.

an indication of whether both its signatures was verified

Again, you couldn't trust this: someone could just modify the file to say "yep, sure, all verified". Best to actually have anything that reads the tally file verify the screeds each time.

I'm not sure where this is quoted from, but I believe it's referring to records internal to the tally-spider. If it's lying to itself, you have a different problem. The point of this is to reduce unnecessary repeat-verification operations by the tally-spider computer on unchanged screeds that it has already verified.

and an MD5 or SHA1 checksum of the record (so that pages which have not changed don’t have to be verified again if the MD5 matches).

Scary. You can't skip the verification step: I could modify the tally file to include false screeds and then generate each one's MD5/SHA1. Since you don't verify the signatures, I can just use bogus ones.

This is the tally-spider informing itself that it's already been there. If the owner of the tally-spider wants it to lie to itself there may be nothing you can do about that :)

The record also includes the fingerprint of the voter’s public key

This step is unnecessary: a signed PGP message implicitly includes the fingerprint.

I am so glad you'll be helping bring this project to fruition! Are you any good at grant writing?

since this is how we watch for uniqueness of screed authors and merge screeds appearing with the same pubkey.

Overall, it's much more secure / simpler to just keep a list of all verbatim signed screeds, and verify each screed+signature on-demand each time you want to work with the data.

I think the computation of two signature verifications (the signature of the screed, and the signature of that pubkey by the registrar) is significant, and it's much easier to just check against one's own cache to see if nothing has changed since the last time we checked.

Thank you so much for your interest! Let's keep this going.

hackergrrl commented 8 years ago

As for the content, anyone can run a tally spider and verify all the signatures themselves, but most people will trust a tally spider run by an organization or by a friend with a server. Again the trust is easy to verify, since if any two tally spiders disagree there is cause for investigation, and any shenanigans will be newsworthy.

This isn't true: comparing two tally spiders' output isn't enough, since opinions are uploaded over the time spectrum, and tally spiders capture only an instant of that. The output of a tally spider from two months ago will differ from one run today. And it's very easy to forge the date that a tally spider was run.

You could trust your friend's tally spider output because you're friends, but since you have pubkey crypto already, you can skip making humans do the legwork. If tally spider output was cryptographically verifiable, anyone can run a spider and anyone else can verify the authenticity of the results.

Another concern is DOS or partial DOS, where the server hosting the signed screeds is selectively censoring certain opinions. This also would be easy to detect, either by participants running a watchdog client on the availability of their screed, or by servers noticing screeds disappearing (they keep a cache of course). In such an event it would be easy to figure out what the censored screeds had in common.

Better yet: store user opinions in a blockchain! Each opinion the user adds includes a hash of their previous opinion. This has the very attractive property that if someone censors one person's opinion it'll mean they have to censor all of their future opinions too. This makes censorship even easier to detect.

Users could also hash link to other people's opinion blockchain, making it even harder to censor opinions: the attacker would need to stop publishing many users in their tally output.

How could the list be trusted? What if users instead ran a "tally list viewer" that just downloaded the list of signed opinions, verified all the signatures locally, and then displayed the computed results (desired percentages)? Since the crypto verification step would be happening on each user's local machine, the data would be more trust-worthy than some blob coming down from an external server.

I am expecting that the work of verifying all these signatures will be more significant than what's appropriate for a personal computer such as a laptop or a phone

This might be worth looking into before dismissing. This is how scuttlebot works.

Of course you can compare the results from multiple servers, and many people will in order to keep an eye out for mischief.

This seems like time consuming human legwork that software + crypto could do for you.

One advantage to this split is that simple clients can run on phones or laptops and simply query the server's database anytime. And the queries can be complex, such as "how many voters have one or more of the following statements" which is not an operation you could perform on your phone.

You could have both. Users who want to trust the output can run it themselves using purely local clients, and others may with to trust the external server. As long as your code exposes the primitives and open source, people will eventually write both flavours, so long as your primitives (screeds, tally output) permit it.

You still need a place to upload your screed to, so you might as well download it and add the new opinion and sign it again.

I think I'm fuzzy on where users will be uploading their screeds to. A single central trusted source? Users' own servers?

A nefarious tally-spider can straight-up lie. Its job is to report the results of tallying signed screeds, and you're trusting its data based on its reputation, as well as its host keys or however you identify it.

You could do trust based on human reputation, but the crypto primitives to avoid this are available.

I'm not sure where this is quoted from, but I believe it's referring to records internal to the tally-spider. If it's lying to itself, you have a different problem.

These are the cases where the tally spider is fabricating results to lie to users. If users are to trust tally spider output without verifying it, then it becomes very easy to fabricate spider output to reflect whatever data I'd like.

This is the tally-spider informing itself that it's already been there. If the owner of the tally-spider wants it to lie to itself there may be nothing you can do about that :)

It seems like something the tally spider could cache in memory while it works, but it doesn't seem like something that belongs in its output format.


And finally, a plug: I'm biased, but IPFS might be a good storage model for this. It stores Merkle DAGs of data over a distributed network. Users could publish opinions /wo needing to secure hosting, and their published opinions are immutable and cryptographically signed.

jerkey commented 8 years ago

you're making it too complicated.