tlambertz / seedvault_backup_parser

Decrypt, Modify and Reencrypt Seedvault Android Backups
Apache License 2.0
103 stars 15 forks source link

add support new backup format #14

Open khimaros opened 2 years ago

khimaros commented 2 years ago

with the update to android 12, seedvault has also bumped the metadata version.

it would be great to be able to use this tool with newer backups.

khimaros commented 2 years ago

@chirayudesai says:

There's a doc in the storage section which this format shares some things with otherwise it should all be https://github.com/seedvault-app/seedvault/pull/327

@grote says:

mostly this with a different key id: https://github.com/seedvault-app/seedvault/blob/android12/storage/doc/design.md I think there's a python version of tink to make your life easier. The k/v backup is a zipped DB now, I think there's no docs on this.

here's the tink library: https://pypi.org/project/tink/

khimaros commented 2 years ago

https://github.com/seedvault-app/seedvault/issues/383 will be helpful for testing this change

khimaros commented 1 year ago

i have access to a reference backup now and am working on a python-tink based decryption tool.

traveling for the next few weeks, so not a ton of time to allocate, but making slow steady progress.

ladar commented 1 year ago

@khimaros please add me t o list of people waiting for an updated tool. I accidently erased several people from my contacts. I'm hoping I can extract just those contacts from backup once it is decrypted.

The only other alternative I cxan think of would involve backing up my device, restoring the seed vault I've preserved, extracting the missing contact records, and followed by a restoration of the backup I made at the beginning. But this doesn't seem like a great option given a) the time involved, and b) the likelihood that something will go wrong.

Adri-Fa commented 1 year ago

Did you ever managed to do this? If not, have you anything to help me?

nettnikl commented 1 year ago

working on a python-tink based decryption tool

Hey @khimaros, could it be an option to share your current progress? I think there are many people interested in this, maybe we could help with testing.

jackwilsdon commented 1 year ago

I've just finished writing a tool to extract v1 backups - it's available here: https://github.com/jackwilsdon/seedvault-extractor

Feel free to open an issue or start a discussion if you need any help!

khimaros commented 1 year ago

@jackwilsdon this is excellent! thank you for your efforts! i'll test this out.

what are the main challenges for supporting KV extraction?

jackwilsdon commented 1 year ago

I think implementing KV support should be pretty straightforward - it appears that it might just be an encrypted gzip'd SQLite database looking at KVRestore.kt. I'll see if I can find some time over the next few days to add it.

jackwilsdon commented 1 year ago

It ended up being simple enough to dump the SQLite database to disk, so I've gone ahead and implemented it in https://github.com/jackwilsdon/seedvault-extractor/commit/e2875f747801b520a3ef7edb2979b8ffb7a0c9a6. Ideally it'd be exported in a more user-friendly format (JSON?), but this is at least a start.

khimaros commented 1 year ago

@jackwilsdon dumping the sqlite database seems adequate to me! it's easy enough to use other tools to export sqlite into a csv, json, or other format.

i tested that changelist against a reference backup provided by the seedvault team and it seemed to work! at least, i was able to select some rows from kv_entry and some of them were human readable.

the reference backup i'm referring to was originally uploaded to a git repo with restricted access. @chirayudesai -- is it okay if i upload a tarball and link it here? i think it would be generally useful to others.

chirayudesai commented 1 year ago

Thanks @jackwilsdon , glad to see this!

the reference backup i'm referring to was originally uploaded to a git repo with restricted access. @chirayudesai -- is it okay if i upload a tarball and link it here? i think it would be generally useful to others.

I'd prefer it not be, it likely doesn't have any PII but I'd rather be safe than sorry.

What we can do is just create a backup from an emulator and that should be ok to share.

The git repo idea also didn't work because I tried to put some data on it and that quickly exceeded GitHub LFS limits, but maybe we can use releases.

jackwilsdon commented 1 year ago

I'm setting up a new device on LineageOS 20 with Seedvault to try and diagnose https://github.com/jackwilsdon/seedvault-extractor/issues/2 - I'm happy to upload a new backup here once I've confirmed it is valid.

khimaros commented 1 year ago

@jackwilsdon that sounds great! i'd be happy to help build a test harness that compares the golden data to the data extracted by your tool. so if you can grab both the full sdcard contents as well as the seedvault backup, that will be very helpful!

jackwilsdon commented 1 year ago

Backup (including storage): SeedVaultAndroidBackup.zip Recovery code: recipe bean exercise lift brother design front mystery convince physical country dust The only item in storage is a photo at Pictures/jackwilsdon.png. I'm unable to test extracting this as https://github.com/jackwilsdon/seedvault-extractor does not yet support extracting storage backups.

khimaros commented 1 year ago

@jackwilsdon this is a great start, but hard to write tests against this unless we also have the golden scard data that the backup was generated from (everything in the emulated storage device). not urgent, but would be helpful for development i reckon.

jackwilsdon commented 1 year ago

Would an adb pull /storage/emulated/0 be good enough? I can reset the phone later and generate a new backup and pull the complete storage using that command if that's more useful? I guess I'll have to put some files in storage first, as otherwise I don't think Seedvault will actually back anything up.

chirayudesai commented 1 year ago

Would an adb pull /storage/emulated/0 be good enough?

I actually did adb root and then adb pull /data

I can reset the phone later and generate a new backup and pull the complete storage using that command if that's more useful? I guess I'll have to put some files in storage first, as otherwise I don't think Seedvault will actually back anything up.

Yes, I pushed a couple random images, took some screenshots (including of the recovery code :D), etc.

jackwilsdon commented 1 year ago

After a fresh reset, I took a few screenshots as you suggested and backed up apps and storage.

Here is /data (without dalvik-cache as it made the zip too large for GitHub): data.zip And the recovery code: dove little pact broom inform cousin club stock remember debate hobby describe

The backup can be found at media/0/.SeedVaultAndroidBackup.

I've tested restoring this backup after another reset and it appears to restore settings and files just fine :+1:

crass commented 6 months ago

@jackwilsdon this is a great start, but hard to write tests against this unless we also have the golden scard data that the backup was generated from (everything in the emulated storage device). not urgent, but would be helpful for development i reckon.

I don't think we really need the unencrypted data (if the decryption is verifying authentication tags, as it should). When using tink's streaming_aead an exception will be thrown if the decrypted data does not match the original data. Of course, this assumes a correct implementation of this in tink, but I think that's a reasonable assumption. The current implementation for V0 backups should also be authenticating because it does decrypt_and_verify with the authentication tag and so should throw an exception if the decrypted data is different than the original data. The perhaps tricky part with out the original unencrypted data is to verify that everything is being decrypted. This can be checked by subtracting the set of encrypted backup files with the set of files used for decryption.

crass commented 6 months ago

Backup (including storage): SeedVaultAndroidBackup.zip Recovery code: recipe bean exercise lift brother design front mystery convince physical country dust The only item in storage is a photo at Pictures/jackwilsdon.png. I'm unable to test extracting this as https://github.com/jackwilsdon/seedvault-extractor does not yet support extracting storage backups.

PR #16 is able to fully decrypt this backup. There are actually 3 files in the storage: Pictures/jackwilsdon.png (199385 bytes), Pictures/.thumbnails/.nomedia (0 bytes), and Pictures/.thumbnails/.database_uuid (36 bytes). This is a decent example of a backup because the files are not encrypted into 1 file each, but all three combined into 1 zip file. A better example backup would also have a larger file that is split into multiple chunks. But I suppose that might be more difficult to host here with github's file size limits.

@ladar, @Adri-Fa, @nettnikl: Check out this PR and see if it works for you.

nettnikl commented 6 months ago

Hey, have not gone fully through in detail, as its quite the bump in functionality, but what ive seen looks good. Sorry for being a bit paranoid, but can we maybe use known (used in public, hashsum known) test images? The recent xz security issue has shown again how risky any blobs, even just in unit tests, are. Maybe just the Lenna image from wikipedia or something similar? What do you say @khimaros ?

crass commented 6 months ago

Hey, have not gone fully through in detail, as its quite the bump in functionality, but what ive seen looks good. Sorry for being a bit paranoid, but can we maybe use known (used in public, hashsum known) test images? The recent xz security issue has shown again how risky any blobs, even just in unit tests, are. Maybe just the Lenna image from wikipedia or something similar? What do you say @khimaros ?

Perhaps I misunderstand you or I'm misunderstood. I understand you to be worried about the test backup posted by @jackwilsdon. To be clear I don't really care about people testing my code against that. I've already done it and am confident that its fine. One the otherhand, because any test backup will be encrypted, how do you know for sure what's inside before decrypting it? Even if someone else says is good?

I'm more interested in people testing against there own backups to catch any missed corner cases (eg it looks like @khimaros may have found one as mentioned in the PR). As far as my code it concerned there are no binary blobs introduced and I never execute any code from the backup, so the xz-style attack doesn't really apply. All my code is python source, so for there to be an exploit issues they would have to take advantage of bugs in the Python interpreter or dependencies, which would likely required 0-days. Please correct me if you meant something else.

nettnikl commented 6 months ago

Hey @crass , sorry, i was a bit unclear. I meant the proposal to have proper unit-testing. That would need blobs included in the repo, exactly like in the xz error. Not talking about the manual testing, though, im completely wirh you on that.