merklecounty / rget

download URLs and verify the contents against a publicly recorded cryptographic log
https://merklecounty.com
Apache License 2.0
205 stars 17 forks source link

Allow a local database to check against #26

Open c33s opened 5 years ago

c33s commented 5 years ago

the first thought i had as i read the article of this nice tool on heise.de was that i don't want to contact a central database. then i read "Certificate-Transparency-Logs von Google" which is a no go for me. i don't want to feed google with download pings. they are already fetching enough data. so a self hosted Certificate-Transparency-Log would be nice.

philips commented 5 years ago

If there was a flag to provide a blacklist and or whitelist is certificate log servers would that address your concerns?

https://ct.cloudflare.com/ has a list of all of the available log servers.

On Wed, Aug 7, 2019, 8:43 AM Julian notifications@github.com wrote:

the first thought i had as i read the article of this nice tool on heise.de was that i don't want to contact a central database. then i read "Certificate-Transparency-Logs von Google" which is a no go for me. i don't want to feed google with download pings. they are already fetching enough data. so a self hosted Certificate-Transparency-Log would be nice.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/merklecounty/rget/issues/26?email_source=notifications&email_token=AAAIGCEN6ABC6PVUX7EQTU3QDLUT5A5CNFSM4IKBZZCKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HD5ZO4Q, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAIGCHYPBNTEDUY6B3UJHDQDLUT5ANCNFSM4IKBZZCA .

c33s commented 5 years ago

i am not quite sure if i have understand the principle of your tool correct. if i download a file the checksum of the file is checked against a stored checksum in a database. as database the certificate transparency log is used somehow. is this correct?

such a flag would help but everything which is not decentralized or self hosted is that cool. the movement from a decentralized internet to a "if google, facebook, amazon, letsencrypt and cloudflare is offline the internet is offline"

i would love to be able to easily maintain a self hosted database/file where simply the urls and the file hashes are stored. the database should be full and part updateable from online databases.

knisbet commented 5 years ago

@c33s Certificate transparency is a system meant to address what some consider a structural flaw in the decentralized authentication of x509 certificates. It's built to protect against a couple scenarios, mainly around if a certificate authority is tricked into signing a certificate it shouldn't, or whether a certificate authority you trust is acting in a hostile manner.

The way certificate transparency defends against this, is by having all issues certificates be published in public logs, which are being run by major organizations. This means that organizations can monitor these logs for miss-issued certificates. And also in a scenario attacking an individual user, that for a miss-issued certificate to be trusted by something like a browser, that it has also been published publicly. As in I can't trick certificate authority A into signing a certificate for google.com, and then only present that certificate to user b. If the browser checks the CT logs to validate the cert, I must publish the cert to a CT log, which means google can find out someone else has a cert for google.com. And a cert authority that repeatedly has problems will get removed from the trusted lists in chrome, firefox, etc.

As I understand this project, the idea is to achieve the same properties but with binary downloads. So with a github release, encode information about the binary files into an x509 certificate, use letsencrypt to sign it, and letsencrypt publishes the log as it would for any new certificate. When someone uses rget to download files from github, it can validate that the file downloaded, is the same one that's been published publicly. Someone at github wouldn't be able to trick you and only you into downloading a different version of the file that contains an exploit, and no one else got that malicious file.

With this in mind, I'm not sure how a self hosted database/file with the file hashes stored would work. How would the database get the new urls and hashes, and be able to ensure they're authentic and the same as the one published to the CT logs without contacting the CT logs. The closest thing I could think of, is to replicate the entire CT log locally, and then just have rget able to check your own replica.