xeals / signal-back

Decrypt Signal encrypted backups outside the app
Apache License 2.0
700 stars 66 forks source link

Issue with "extract": encoding `` not recognised #10

Closed shymu closed 6 years ago

shymu commented 6 years ago

I'm trying to extract a backup made on Signal 4.18.3 on a Samsung Galaxy Alpha.

The backup completed (seemingly) successfully on the phone and the backup file was then transferred to a MacBook Pro using Google's "Android File Transfer" app.

Running signal-back with the "analyse" option on this file seems to work just file, which leads me to believe the backup file is good, when I try to run an extract I get this error:

./signal-back_darwin_amd64 extract -p [pass] signal-[date].backup 
error: failed to extract attachment: encoding `` not recognised. create a PR or issue if you think it should be

I can't figure out if this is an error related to the backup file or signal-back, or how to further diagnose this.

xeals commented 6 years ago

The backup file should be okay. If the backup or attachment was corrupted, it would've been a more spectacular failure.

extract uses MIME types to determine output file extensions. I'm not sure why it doesn't have a recognised MIME type, but I've pushed a fix that will just give debug output instead. Are you able to build the program from source to have another try?

shymu commented 6 years ago

I was able to build from source, but it doesn't give me much more info, just a timestamp:

$ ./signal-back extract -p [pass]  signal-[date].backup 
2018/04/30 07:53:30 encoding `` not recognised. create a PR or issue if you think it should be
xeals commented 6 years ago

I made the wrong change. In cmd/extract.go, can you change the two instances of log.Fatalf at lines 224 and 225 to log.Printf instead, then try again? Optimally you should still get a file spat out, but it just won't have an extension. If you can check what that file's supposed to be, I can figure out what the issue is.

shymu commented 6 years ago

Ok, now I get 504 lines of this, before it finally ends (here's just a snippet from the end):

2018/04/30 08:06:09 encoding `` not recognised. create a PR or issue if you think it should be
2018/04/30 08:06:09 if you can provide details on the file `1495031440126` as well, it would be appreciated
2018/04/30 08:06:09 encoding `` not recognised. create a PR or issue if you think it should be
2018/04/30 08:06:09 if you can provide details on the file `1495036527311` as well, it would be appreciated
2018/04/30 08:06:09 encoding `` not recognised. create a PR or issue if you think it should be
2018/04/30 08:06:09 if you can provide details on the file `1495064583717` as well, it would be appreciated
2018/04/30 08:06:09 encoding `` not recognised. create a PR or issue if you think it should be
2018/04/30 08:06:09 if you can provide details on the file `1495165660156` as well, it would be appreciated
error: failed to extract attachment: failed to open output file: open 1495165660156.: too many open files
shymu commented 6 years ago

How do I figure out what the file is supposed to be? Without an extension I'm not sure where to start...

xeals commented 6 years ago

MacOS (iirc) determines file type by encoding rather by extension (like Windows does), so you could just try opening them from Finder. If that doesn't work, try a text editor and see if they're at least plaintext. If that also doesn't work, I might need to check that I'm extracting from the right place in the backup.

Another alternative might be to change around line 72 to be the following:

if len(ps) == 25 {
  aEncs[*ps[19].IntegerParameter] = *ps[3].StringParamter
  if *ps[19].IntegerParameter == 1495165660156 {
    fmt.Printf("%v", ps)
  }
}

or change 1495165660156 to one of the other numbers it spat out.

Note that that might contain sensitive information, so censor at will if you need to.

shymu commented 6 years ago

Unless I’m missing something (I’m assuming the files are supposed to extract to the same folder as the backup file?) nothing actually extracts so there is no file to check, all I seem to have to go on is this debug output

xeals commented 6 years ago

Ah, so it hasn't. Missed that last line.

The fix is in master, or just add file.Close() above line 86.

neurolit commented 6 years ago

Hi!

Same warnings for me on MacOS. I compiled your updated code and extracted every file of my backup. Attachment files are extracted without any extension. They are JPG, PNG and MPEG files.

xeals commented 6 years ago

If you can build and run the code in the devel branch, that might give me some insight. It should print out the whole entry if there's a missing encoding type.

shymu commented 6 years ago

Hey sorry, there was some degree of user error on my part, these files actually were extracting to my $GOPATH/bin/ (since that's where I was running the command apparently) and not where my backup was located.

Similar to @neurolit's findings, the files seem to be a mix of JPG, PNG and MPEG, all without extensions.

Anyway, I pulled the latest master, built, set ulimit -n 1024, and tried again.

This time it "completed" but every file still dumped something like this to the console:

2018/04/30 20:57:01 encoding `` not recognised. create a PR or issue if you think it should be
2018/04/30 20:57:01 if you can provide details on the file `1522622403410` as well, it would be appreciated

The referenced file, 1522622403410 is able to be previewed in finder (though it seems to lack any MIME data) and if I add a .jpg to the filename I can open it just fine.

shymu commented 6 years ago

There are some files that finder doesn't seem to know what to do with and I can't figure out what they are either. For example:

$ file -I 1522811988909 
1522811988909: application/octet-stream; charset=binary
$ hexdump -C 1522811988909 
00000000  a4 12 b0 10 24 30 53 d6  0e a4 a9 d4 30 9f 23 99  |....$0S.....0.#.|
00000010  1a 20 a4 bd eb d2 71 73  cd 2b b4 3c f8 cd 6a 40  |. ....qs.+.<..j@|
00000020  f7 8d 62 c2 d7 05 0e 38  22 3a b3 b8 f3 91 b8 4f  |..b....8":.....O|
00000030  5c 10 5b 5a d4 ea a0 8c  c3 c6 cd 7b 4b c8 33 87  |\.[Z.......{K.3.|
00000040  0e 46 e2 e4 a9 02 0a 63  f9 b6 bd c6 72 52 04 9c  |.F.....c....rR..|
00000050  e5 08 cb 7e 47 65 93 8c  36 10 a5 74 bd 5c c6 9d  |...~Ge..6..t.\..|
00000060  81 58 e0 d0 1c 18 96 2e  68 2b 9c bb d3 d9 12 17  |.X......h+......|
00000070  8c 65 c8 9d 20 b7 ce 69  34 ef 33 42 bf b7 37 b7  |.e.. ..i4.3B..7.|
00000080  13 b4 36 1a 40 c3 32 55  f5 1f 7b 25 6a 8c 1f e1  |..6.@.2U..{%j...|
00000090  6d 14 39 a3 d3 ad 78 e6  73 9f 86 fb 61 40 c5 74  |m.9...x.s...a@.t|
000000a0  e1 31 92 14 c0 e5 44 63  40 d9 de a6 82 94 05 4a  |.1....Dc@......J|
000000b0  e2 b0 76 42 65 47 cf b0  97 a0 5d a7 91 6a 41 21  |..vBeG....]..jA!|
000000c0  09 1c 15 5d 30 7c a5 41  41 97 14 91 c6 8e d9 d1  |...]0|.AA.......|
000000d0  35 df 03 45 23 20 8d ef  60 7d d8 17 ff 9e 5c 16  |5..E# ..`}....\.|
000000e0  63 23 e0 e0 be 99 bd 18  24 c1 87 72 a4 c2 76 df  |c#......$..r..v.|
000000f0  18 1b d8 1d 0e 46 7d c9  2c b4 3f e4 46 1d 14 b9  |.....F}.,.?.F...|
00000100  08 da 24 54 d2 c1 d7 ae  f7 46 30 2c b7 9a 14 e9  |..$T.....F0,....|
<snip>
xeals commented 6 years ago

Can you run the same on the devel branch? It should give a little more debugging output.

shymu commented 6 years ago

Unfortunately I've only been using git since I filed this bug, so forgive me for having such a basic issue, but I'm not sure how to pull the devel branch? So far I've tried:

$ git checkout devel
error: pathspec 'devel' did not match any file(s) known to git.

to no avail :(

xeals commented 6 years ago

All good.

$ git pull origin devel # or git clone https://github.com/xeals/signal-back --branch devel
$ git checkout devel
shymu commented 6 years ago

I don't seem to be getting any additional output w/ the devel branch:

2018/04/30 21:18:11 encoding `` not recognised. create a PR or issue if you think it should be
2018/04/30 21:18:11 if you can provide details on the file `1522811988909` as well, it would be appreciated

I did confirm those 3 line changes to extract.go were in the new file after pulling devel and building, so I'm fairly certain I'm using the right branch.

xeals commented 6 years ago

That's even more interesting. I've just hard-coded that file number, so pull and try again.

shymu commented 6 years ago

Nothing new

2018/04/30 21:24:34 encoding `` not recognised. create a PR or issue if you think it should be
2018/04/30 21:24:34 if you can provide details on the file `1522811988909` as well, it would be appreciated
xeals commented 6 years ago

Oh, that's more interesting.

I was under the impression that every attachment (stored as a binary blob in the backup) came with a matching parts SQL query containing metadata. If it's not running the part I've been changing, there's no metadata entry (or it's out of order). The next change will just spit everything it finds. If you can put the entire thing into a gist/pastebin/etc. that'd be great. It shouldn't contain anything sensitive.

Also, could you provide the second line of the output of signal-back analyze? Should start with map and have a bunch of key/value pairs.

neurolit commented 6 years ago

I tried with the devel branch:

With extract:

found attachment binary 1524953561841

2018/05/02 14:41:28 encoding `` not recognised. create a PR or issue if you think it should be
2018/05/02 14:41:28 if you can provide details on the file `1524953561841` as well, it would be appreciated

And with analyze:

map[insert_into_part:689 attachment:688 insert_into_identities:4 pref:2 create_index:19 drop_index:19 insert_into_thread:9 drop_table:13 create_table:13 insert_into_sms:32039 insert_into_mms:691 insert_into_recipient_preferences:538 avatar:1 version:1]
part: 27 statement:"INSERT INTO part VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" parameters:<integerParameter:1 > parameters:<integerParameter:2 > parameters:<integerParameter:0 > parameters:<stringParamter:"video/mp4" > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<integerParameter:0 > parameters:<stringParamter:"/data/user/150/org.thoughtcrime.securesms/app_parts/part1361163667.mms" > parameters:<integerParameter:143930 > parameters:<stringParamter:"20170408_161034.mp4" > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<integerParameter:1491660667638 > parameters:<nullparameter:true > parameters:<nullparameter:true > parameters:<integerParameter:0 > parameters:<blobParameter:"-\351\276?\021\253\025\\\324\344E5q\300f\262\340\344v\022|\225:\244k\\\374\266\031\026G\211" > parameters:<nullparameter:true > parameters:<integerParameter:0 > parameters:<integerParameter:0 > 
xeals commented 6 years ago

Is that the entire output of extract?

edit: I guess it's only the errored bit. For some reason one attachment doesn't have a matching metadata entry in yours. I might be able to rig up some sort of detection method if it's missing, but it might not work.

neurolit commented 6 years ago

I have 688 attachments (approx. 2000 lines of logs). For each attachment, I've got these 4 lines, yes.

xeals commented 6 years ago

If it's not giving any lines starting with found attachment metadata, then I'm at a loss.

I've pushed a change that tries to guess encoding based on the file contents. You'll need to dep ensure before trying to run again.

neurolit commented 6 years ago

I confirm I have no found attachment metadata line. I'll try your new code.

neurolit commented 6 years ago

It works!

Logs (for one attachment):

2018/05/02 15:24:28 found attachment binary 1524953561841

2018/05/02 15:24:28 file `1524953561841` has no associated SQL entry; going to have to guess at its encoding

Files have now the right extension, except for three of them (*.unknown files):

$ file *.unknown
1498568946944.unknown: MPEG ADTS, AAC, v4 LC, 44.1 kHz, monaural
1519327446092.unknown: MPEG ADTS, AAC, v4 LC, 44.1 kHz, monaural
1522171621751.unknown: ISO Media
xeals commented 6 years ago

Good. Thanks for your help with this. I don't think there's much I can do about those files myself, but it might be worth dropping an issue over at the upstream repo with the file types and initial X bytes if you're willing.