Open hellais opened 8 years ago
One approach to getting around this issue would be to ask the user to supply additional information for search queries such that the level of entropy present in the search is sufficient to allow for authorship to be measured.
I feel like you are touching on multiple issues here, it looks like you're talking about proof of authorship, as well as proof that the collectors are passing valid data.
After discussing the design I had in mind for this with @bassosimone I came up with the following scheme:
report_id + nonce
(where + is the string concatenation).
This signature is appended to a report and is proof of the fact that the probe possesses the public key with the given fingerprint.This signature is appended to a report and is proof of the fact that the probe possesses the public key with the given fingerprint. @hellais I guess here you meant the private key of the given fingerprint?
Regarding the probe identifier are we going to have a list of known good public key fingerprint or how are we going to handle the "legit" public keys?
Yeah I meant proof of ownership of the private key of course.
I am thinking that what can be done is twofold:
1) We can store a list of known good public that are those, for example, of the partners.
2) We can look at historical data and assume that keys that have for a long time been contributing good stable measurements are OK and not rogue, while ones that appear as new ones should be trusted "less".
Possibly we could also do this by means of having a "channel" or "account" that is mapped to a group or organisation that runs a campaign.
Currently control baseline measurements are distinguished only based on the ASN. This is very weak, because even without malicious intentions users of ooni-probe could mistakenly have their reports classified as control, while they are not controls run by us.
It seems like we could go about doing this in various different ways.
As a requirement to make the overall CPU load not increase it would be good to implement at the same time or around the same time also submission of reports to the collectors using JSON to avoid the serialisation/deserialisation step currently required by the data pipeline.