horrorho / InflatableDonkey

iOS9+ iCloud backup retrieval proof of concept
MIT License
264 stars 87 forks source link

horrorho-jirmi discussions #38

Open michaljirman opened 8 years ago

michaljirman commented 8 years ago

Follow-up from #30

Thanks for the info, have overlooked the constructed types. Will need to read the resources more properly next time :wink:. Quite like the online decoders. :thumbsup:

... I do have a private tool I created to rip the ASN1 templates that Apple uses from it's binaries. It's not polished and not easy to use, you need to provide it with the exact offset of the template. It's this tool that is responsible for the output you see in comments. I can upload it to GitHub if you like.

If you can then it would be great to share it with us. It could come handy in the future. Although for now I'm more then happy with the tools, docs and comments which you've provided.

horrorho commented 8 years ago

I'll give the tool a bit of a brush and polish before I upload it, hopefully by the end of the week.

horrorho commented 8 years ago

Ok! I've pushed the ASN1 template ripper. It goes by the unfortunate name of FuriousPotato.

I'll fill in the README.md hopefully this evening. The tool does have a --help option.

horrorho commented 8 years ago

README.md is done. I've made a few code alterations too.

michaljirman commented 8 years ago

Thanks a lot for the FuriousPotato!!! I'm excited to see the source code and going to ask surely some questions. Will have a look at it hopefully this weekend as I have been recently stuck with modular arithmetic for the EC. Didn't anticipate how much work it is. Learning lots of new stuff by debugging your code.

horrorho commented 8 years ago

Elliptic Curve Cryptography Tutorial I found useful.

horrorho commented 8 years ago

If you haven't already done so, grab Apple's corecrypto source. When it comes to reversing binaries a lot of code can be directly mapped onto it.

For example in PCS.dll, if you can identify the corecrypto/OpenSSH/heimdal-asn1/asn1-template.c _asn1_decode_top function you can map it's usages and locate asn1_templates along with their context.

michaljirman commented 8 years ago

Hi, hope this is not totally stupid question but could you please help me to understand how did you exactly discover the fingerprint value (0x66696e67... => "fingerprint") from PZDataUnwrap.java? Is the value standardized in any document or was it reverse engineered from secd.exe binary?

horrorho commented 8 years ago

Not a stupid question at all. It was reversed. In this case the fingerprint is actually "fingerprint" zero padded to 20 octets. I'm not aware that is standard practice.

There a few of these hex strings/ keys littered about. Some of them like "Anonymous Sender " are specified in RFCs. At other times they appear to be random words like "wrapping key wrap me" that Apple developers just made up. Almost all of them have been pulled out of binaries.

michaljirman commented 8 years ago

Could you please try to describe how it was reversed? I know this is a whole new topic and probably way to extensive and complex but it would help me just to know how to start. I suppose if you open the binary (secd.exe) in the IDA Pro then you are capable of identifying a method where the wrapping or unwrapping is done and from there you would be looking for parameters. Unfortunately, I'm not capable of identifying it on my own but I believe I could manage if you guide me through the beginnings. I'm suppose I will need this knowledge later for the iCloud iOS 10 compatibility anyway.

horrorho commented 8 years ago

Reverse engineering is a black art and I'm no expert. Reversing crypto code is horrible and I hate doing it. One of the reasons InflatableDonkey was delayed is because I hated it so much.

But to try and answer your question, yes if you identify keys, hex strings, etc you can trace back to the functions that use them.

My Windows tool set is IDA Pro + Hex-Rays Decompiler, Ollydbg, Winapioverride32, Fiddler, Proxifier, Microsoft Visual C++ and probably one or two others I missed. I don't have a Mac or an iOS device, otherwise I think I would have used that to reverse with.

Windows Apple binaries are a mixture of compiled C and C++. C being significantly easier to reverse.

If you can find someone who knows how to reverse well to assist. I had never touched IDA Pro in my life until I started working on InflatableDonkey so my experience is limited.

I started by setting up iCloud to run through Fiddler, you'll need to add an exception for the certificate or it will fail. I experimented with different tools, so at times I'd run the Fiddler session with Winapioverride32 to see which functions in which Apple binaries were being called in relation to various client-server calls. At other times I used OllyDbg. This gave me an idea of which binaries and which exported functions I needed to concentrate on.

At times I used Visual C++ to call DLLs directly so I could study them in isolation without having to rely on debugging iCloud.

The other thing you need to become familiar with is how Apple CoreFoundation structures are laid out in memory. Thankfully they are fairly easy to google.

If you can, identify corecrypto functions. They are well documented and you have the source code available. Any functions calling into them will be easier to figure.

All of this being said, I think if you are unsure I would google for reversing guides and start with simple C binaries first. Don't pressurise yourself with learning the basics on a complex code base. I was silly enough to start with PCS.dll and to be honest for the first week I just learned basics techniques that would have been much simpler to grasp using a smaller binary.

What you'll need first is the skill to reverse engineer. It will probably be difficult for a week or two, but it will start to fall in to place.

I often refer to cryptography and reversing as twin black arts. Each on their own is more than capable of inducing a headache, together they are horrible. I'm no genius and I'm an amateur programmer, I work in a non-IT field . If I can managed the feat then it's certainly doable.

Useful plug-in for crypto: http://www.openrce.org/downloads/details/189/FindCrypt2

First in a series of C++ articles. http://www.openrce.org/articles/full_view/21

horrorho commented 8 years ago

Ok, I have managed to grab a tea break and a few minutes. I do realise my previous answer is a bit vague. I'll give you a more specific example. It is rather contrived and probably over simplified but I think you'll understand what I'm getting at.

In the same way I couldn't teach some to program in Java in a few shorts posts, I can't do the same with reversing. I'm also no expert on reversing so please take that into consideration. The bag of tricks that experienced reverse engineers have is huge.

The example considers mapping binaries to code as opposed to locating the InflatableDonkey Java structures in binaries.

I assume that at this point you have a basic grasp of how C binaries work and that you've practised using IDA Pro. You don't need to know assembly language inside out but you must be aware of calling conventions and how they work. Hex-Rays decompiler can struggle with them at times and you'll have to help it out.

Let's begin.

OllyDbg: we are tracing a DLL function that is being used in a particular client-server call. Soon we find a binary section that references "Anonymous sender". We have no idea what this function does. At this point we have no idea that RFC 6637 exists. I don't have XCode, so we make do with old fashioned commands. Lets try our luck with source code we have access to.

corecrypto $ grep -r "Anonymous Sender"
ccec/src/ccec_rfc6637_kdf.c:    ccdigest_update(di, dictx, 20, "Anonymous Sender    ");

Ok! RFC6637, what is this? Oh it's RFC 6637.

The function that references "Anonymous Sender" has the header:

void
ccec_rfc6637_kdf(const struct ccdigest_info *di,
                 const struct ccec_rfc6637_curve *curve,
                 const struct ccec_rfc6637 *wrap,
                 size_t skey_size, const void *skey,
                 size_t fingerprint_size, const void *fingerprint,
                 void *hash)

We disassemble the corresponding binary section in IDA Pro with the brilliant Hex-Rays decompiler which saves us a lot of work.

Decompiled header:

void *__usercall sub_1002E500@<eax>(_DWORD *a1@<edx>, int a2@<ecx>, int a3, int a4, int a5, int a6, int a7, int a8)

Decompiled code snippet:

  sub_1002B920(2u, v13, &v17, &unk_100F4FB0);
  sub_1002B920(1u, v13, &v17, (a3 + 4));
  sub_1002B920(1u, v13, &v17, (a3 + 12));
  sub_1002B920(0x14u, v13, &v17, "Anonymous Sender    ");
  sub_1002B920(0x14u, v13, &v17, a7);
  (*(v15 + 28))(v15, &v17, a8);
  return memset(&v17, 0, *(v15 + 4) + 12 + *(v15 + 8));

This maps almost perfectly with:

    ccdigest_update(di, dictx, 2, "\x03\x01");
    ccdigest_update(di, dictx, 1, &wrap->kdfhash_id);
    ccdigest_update(di, dictx, 1, &wrap->kek_id);
    ccdigest_update(di, dictx, 20, "Anonymous Sender    ");
    ccdigest_update(di, dictx, fingerprint_size, fingerprint);
    ccdigest_final(di, dictx, hash);
    ccdigest_di_clear(di, dictx);

Given the subtle variations in libraries, compiler versions/ optimisations/ in-lining/ etc it's remarkable how often things still work out so well for us.

Now it's Sudoku time. The decompiled header has the wrong return type, lets fix that. We add in our structs. It's clear that sub_1002B920 is ccdigest_update and that helps us enormously. We slowly fill in the gaps and work out our Sudoku puzzle.

We eventually get something like this (it's from a different DLL version from the previous examples so it's not a perfect match).

void __usercall ccec_rfc6637_kdf(const ccec_rfc6637_curve *curve@<edx>, const ccdigest_info *di@<ecx>, ccec_rfc6637 *wrap, int a4, size_t skey_size, const void *skey, size_t fingerprint_size, const void *fingerprint, void *hash)
{
  const ccec_rfc6637_curve *curve_edi; // edi@1
  const ccdigest_info *_di_ebx; // ebx@1
  ccdigest_ctx_t buf; // esi@1

  curve_edi = curve;
  _di_ebx = di;
  buf.hdr = operator new[](8 * ((di->state_size + 19 + di->block_size) >> 3));
  memcpy(buf.hdr + 8, _di_ebx->initial_state, _di_ebx->state_size);
  *buf.hdr = 0;
  *(buf.hdr + 1) = 0;
  *(buf.hdr + _di_ebx->state_size + _di_ebx->block_size + 8) = 0;
  if ( a4 )
    ccdigest_update(5u, _di_ebx, buf, &q_unk_10079578_header00000001);
  else
    ccdigest_update(4u, _di_ebx, buf, &q_byte_10079580_header00000001);
  ccdigest_update(skey_size, _di_ebx, buf, skey);
  ccdigest_update(1u, _di_ebx, buf, curve_edi->curve_oid);
  ccdigest_update(*curve_edi->curve_oid, _di_ebx, buf, curve_edi->curve_oid + 1);
  ccdigest_update(1u, _di_ebx, buf, &curve_edi->public_key_alg);
  ccdigest_update(2u, _di_ebx, buf, &q_byte_10079588_header_0301);
  ccdigest_update(1u, _di_ebx, buf, &wrap->kdfhash_id);
  ccdigest_update(1u, _di_ebx, buf, &wrap->kek_id);
  ccdigest_update(0x14u, _di_ebx, buf, "Anonymous Sender    ");
  ccdigest_update(0x14u, _di_ebx, buf, fingerprint);
  (_di_ebx->final)(_di_ebx, buf.hdr, hash);
  operator delete[](buf.hdr);
}

There is an a4 that's not named. Although this is quite a complete decompilation, it doesn't always need to be so. You're not going to reverse all of Apple's code base, it's massive. Do the absolute minimum you can get away with. This is not being lazy, it's being efficient. There are times that you have functions where you figure out the headers but very little of the code because there is no need to, you already know what's going on.

We have at this point also found the identity of ccdigest_update. We can search for it's usages using IDA Pro and attempt to map them onto corecrypto. This will no doubt reveal further cryptographic code. Just be careful that you don't waste time mapping code paths that are irrelevant to us.

So we now have:

DLL call > unknown code > more unknown code > ccec_rfc6637_kdf > ccdigest_update > unknown code > etc.

Let's find ccec_rfc6637_kdf uses (I've filtered out the fluff):

corecrypto $ grep -r "ccec_rfc6637_kdf"
ccec/src/ccec_rfc6637_wrap.c:    ccec_rfc6637_kdf(di, curve, wrap->pgp, skey_size, skey, 20, fingerprint, hash);
ccec/src/ccec_rfc6637_unwrap.c:    ccec_rfc6637_kdf(di, curve, wrap->pgp, skey_size, skey, 20, fingerprint, hash);

We then perform the corresponding search in IDA Pro. So it continues.

At some point you'll come out of corecrypto and into libraries that are unpublished. You'll just have to figure these out and what they are doing. You can often examine the inputs and outputs of the functions via dynamic analysis/ OllyDbg and often is becomes clear what's happening.

Other times your code base has nothing to do with corecrypto or any other source that Apple have published. You'll just have to manage and use other tricks. If you suspect that some sort of crypto code is still being used you can look for crypto constants and work from there. If you think Base64 is involved, maybe there is a look up table somewhere. Other times you find logging messages which give away function names.

You'll also bump into CoreFoundation structures in their binary forms. These can be mapped into their various structs which assists considerably. Apple have published the source code for them.

C++ is more difficult and Apple does use it. There are tutorials out there and IDA Pro does have the C++ ClassInformer plug-in available.

There'll come a point were you'll get inside the head of Apple coding teams and you'll notice the distinct styles they have.

Just to conclude. It is a black art and it's not always easy but it can be done and it has been done.

michaljirman commented 8 years ago

I was already glad for the first reply and planning on going through the C++ articles etc. but the second reply is simply wow. Thanks a lot for taking time to write it! Hope I can absorb it in my whole life. You have got a quite impressive knowledge considering you are working in a non-IT field 👍.

horrorho commented 8 years ago

@Jirmi Thank you! But no, it just seems that way because it' all new. It will come together at some point.

horrorho commented 8 years ago

@Jirmi I'm reluctant to discuss in too much detail a field that I don't know well but I think there is a slightly different mentality involved with reversing that throws some developers, even the highly experienced. I tend to dislike the phrase "out of the box" thinking because often they are just thinking out of other boxes, but it does apply here.

In my contrived example, I can see how some would put down the finding of "Anonymous sender" in corecrypto as a lucky break. It's not quite that simple, we did stack the odds heavily in our favour and that search did have reasoning behind it. We know the binary we are working on is crypto heavy. It's also likely that Apple will use it's internal corecrypto library for it. This could be considered obvious, particularly in retrospect.

michaljirman commented 8 years ago

hi @horrorho, could you pls try to give me some explanation of a code below - originally from ChunkClient.java:-

Line:110 StorageHostChunkList compact(StorageHostChunkList container) { ...

horrorho commented 8 years ago

Hi. InflatableDonkey's caching concurrent download engine is a bit messy, I never completed the replacement. As it is the engine is hacks on top of more hacks.

The chunk server basics are largely unchanged from iOS8 iLoot. If I can explain the process along with the alterations in iOS9 it might be simpler for you to port iLoot's download engine instead of using InflatableDonkey's engine.

In short files are broken up into chunks which are placed into containers and uploaded to servers. To reconstruct a file we download the relevant containers then assemble the required chunks in order.

This mechanism corresponds to FileChecksumStorageHostChunkLists which breaks down into two basic components:

SHCL in more detail. The relevant protobufs are:

message StorageHostChunkLists {
  repeated StorageHostChunkList storage_host_chunk_list = 1;
  repeated ChunkError chunk_error = 2;
}
message StorageHostChunkList {
  required HostInfo host_info = 1;
  repeated ChunkInfo chunk_info = 2;
  required string storage_container_key = 3;
  required string storage_container_authorization_token = 4;
}
message ChunkInfo {
  required bytes chunk_checksum = 1;
  optional bytes chunk_encryption_key = 2;
  required uint32 chunk_length = 3;
  optional uint32 chunk_offset = 4;
}

Each SHCL corresponds to a container and it's associated chunks. The containers are ordered sequentially beginning from a 0 index as they appear in StorageHostChunkLists. The chunks are also ordered in a similar fashion within each SHCL container.

The SHCL.HostInfo gives us the information we need to download an encrypted container data. Each chunk is then pulled out using ChunkInfo.offset and ChunkInfo.length. We then decrypt the chunk using the ChunkInfo.chunk_encryption_key (see below) and verify the data with ChunkInfo.chunk_checksum. We then store this decrypted chunk data along with it's container index and it's chunk index.

FCCR in more detail. The relevant protobufs are:

message FileChecksumChunkReferences {
 required bytes file_checksum = 1;
 repeated ChunkReference chunk_references = 2;
 optional bytes file_signature = 3;
}
message ChunkReference {
  required uint64 container_index = 1;
  required uint64 chunk_index = 2;
}

Each file is reconstructed by piecing together the required chunks as referenced by their container/ chunk indices. At this point we have a raw file referenced by it's file checksum/ signature. To reconstruct the file in full we pass this data onto other code that ties in the file checksum/ signature to it's directory/ file name (Asset). If data protection is present the file will need additional decryption.

Chunk decryption: iOS8 the ChunkInfo.chunk_encryption_key is of type 1. It is applied with AES 128 CFB mode to decrypt chunks. iOS9 we have type 2 keys which are RFC3394 wrapped keys. We unwrap them to type 1 keys using the key encryption key present in it's corresponding Asset.

InflatableDonkey takes iOS9 SHCLs and converts them into iOS8 SHCLs by converting type 2 ChunkInfo.chunk_encryption_keys to type 1 via the Voodoo classes. We basically go through all of the keys and unwrap them.

An additional complexity peculiar to InflatableDonkey is that we use a streaming decoder that doesn't backtrack on data. This requires us to use the StorageHostChunkList#compact method. It's probably easier explained with an example:

A raw container data of 100 bytes. It contains only a single chunk of 100 bytes. The chunk offset is 0 and it's length it 100.

Let's say we have two assets that required this chunk, one.txt and two.txt. Each asset will have it's own key encryption key which is used to unwrap the type 2 key to type 1. So we have two ChunkInfos.

ChunkInfo one.txt:
  chunk_checksum = CHECKSUM;
  chunk_encryption_key = KEY_A; // type 2 wrapped key unwraps to KEY_C
  chunk_length = 100;
  chunk_offset = 0;
}
ChunkInfo two.txt:
  chunk_checksum = CHECKSUM;
  chunk_encryption_key = KEY_B; // type 2 wrapped key unwraps to KEY_C
  chunk_length = 100;
  chunk_offset = 0;
}

The CHECKSUM will be identical as the data can only be decrypted to one outcome.

If we go ahead and convert to type 1 by unwrapping the type 2 keys:

ChunkInfo one.txt:
  chunk_checksum = CHECKSUM;
  chunk_encryption_key = KEY_C // type 1 key
  chunk_length = 100;
  chunk_offset = 0;
}
ChunkInfo two.txt:
  chunk_checksum = CHECKSUM;
  chunk_encryption_key = KEY_C // type 1 key
  chunk_length = 100;
  chunk_offset = 0;
}

We now have two identical ChunkInfos. KEY_C is identical. The issue with our decoder is that it cannot rewind back to offset 0 for the second ChunkInfo having already read the data for the first ChunkInfo. Actually it doesn't even need to do this, we already have the decrypted chunk placed in our cache store. To solve this we use StorageHostChunkList#compact method to remove identical ChunkInfos. Also we pull out chunks using checksums so the loss of ordering is not an issue. Again depending on how you implement the your decoder this may not be an issue you need to solve.

Let me know if you have any further questions. This part of the code base is quite messy so if you find yourself a bit lost then it's probably my fault.

Additional information from Elcomsoft: https://www.elcomsoft.com/PR/recon_2013.pdf. Page 18+.

horrorho commented 8 years ago

On further reflection. I do feel that it would be easier for you to work on patching iLoot's download engine to work with iOS 9 than it would be to port InflatableDonkey's engine.

The main reasons being that InflatableDonkey's engine is multi-threaded and also employs disk caching. These two features whilst wonderful also significantly increase complexity. In removing the requirements for concurrency and caching large chunks of code can be discarded.

If at a later date you decide that caching or concurrent downloads would be nice, you can add them in. At this point you'll also have a more complete understanding of the underlying mechanics.

Again it's entirely up to you how you approach this.

michaljirman commented 7 years ago

@horrorho just wondering pls ... how did you manage to test the AES XTS? Did you get hands on any testing backup from an A8 device running iOS 9.3?

horrorho commented 7 years ago

@Jirmi I had access to logs/ data via a third party but no physical device/s to play with. The iOS Security Guide mentioned AES-XTS but no details. Windows iCloud had no AES-XTS code that I could find and I was initially concerned that the task would be beyond me.

I coded an AES-XTS engine and then sat down and figured out the protocol using logs and backup data. It actually went surprisingly smoothly with the only bump being an idiosyncratic tweak function.

michaljirman commented 7 years ago

@horrorho thanks, I will try to get access to this iCloud backup and test it more.

If anyone has a iCloud backup of iOS 9.3 from iPhone 6/6+ and could provide me with some testing backup it would be very appreciated ;-)

@horrorho could also please tell me why chunk's offsets are not sometimes equal to the expected value? (see code ChunkListDecrypter.java:Line80)

horrorho commented 7 years ago

@Jirmi Even though I have had limited sets of data to sample, I'm fairly sure the chunk data used to be contiguous. These days it can contain small gaps. I'm assuming that this is an optimisation on Apple's end. It's more efficient to service a single request with some redundant data than multiple small requests. The redundant data cannot be decrypted without keys, which are not supplied.

Contrived example:

Data as it exists on server. Retrieved via HTTP 206 Partial Content requests.

0000 - 0100 A
0100 - 0250 B
0250 - 0300 C
0300 - 0400 D
0400 - 0500 E

We only need chunks A B and D.

The old process would require two server requests.

Http request 1: 0000 - 0250 giving us A and B
Http request 2: 0300 - 0400 giving us D

Newer process with a single request.

Http request 1: 0000 - 0400 giving us A B and D with a small redundant block

Again, this is an assumption on my part and I'm happy to be corrected. It could be that it's always been like this and I've just not encountered it.

iLoot loads all the server chunk data into memory before pulling out the chunks so isn't affected by this. InflatableDonkey's streaming decoders are more memory efficient and smoother on disk/ CPU utilisation at the expense of some coding complexity. iLoot has a tendency to stutter with CPU spikes and can struggle to saturate available bandwidth even on powerful machines.

horrorho commented 7 years ago

@Jirmi Correction/ clarification. The ChunkInfo.chunk_offset chunk server protobuf field was not present prior to iOS 9 so iLoot would not have encountered non-contiguous chunks. However the basic premise stands that if you load all the data into memory first you can just use the chunk offset and length to retrieve chunks in a simple manner. Sorry for the inconsistency, it's 3am here and my brain is in low battery mode. :sleeping:

michaljirman commented 7 years ago

@horrorho no problem, 3am is indeed too late for anything .... thanks for explanation, frankly, I'm glad I have anyone to discuss it with ;-)

In my case below, a container contains two chunks A[0-398] and B[535-896] with a gap between them. The ChunkListDecrypter.java on line 80 throws an exception while checking the chunk B with offset which results in throwing away whole container and its chunks. I'm not sure if that's what we want or if we shouldn't try to merge these chunks somehow together anyway.

chunk_info {
  chunk_checksum: "\201v\240qch\020E\ay\377\344o\b\252/\244\a\300)\260"
  chunk_encryption_key: "\001\342\227$]((@%@\001~\331\346\260V@"
  chunk_length: 398
  chunk_offset: 0
}
chunk_info {
  chunk_checksum: "\201 \337\005Iju\031\034\316P\361B\227\313d\276;O>\021"
  chunk_encryption_key: "\001\321W\225\211>B\373*\035\230\034FV\245\206\250"
  chunk_length: 896
  chunk_offset: 535
}

But your comment section of the ChunkListDecryptor states:-

Limited to contiguous chunk blocks and type 0x01 chunk decryption

so probably merging chunks which are not contiguous is not possible?

However the basic premise stands that if you load all the data into memory first you can just use the chunk offset and length to retrieve chunks in a simple manner.

would this method allows us to retrieve the chunks as from example above?

I am also getting access to some iOS 9.3 iPhone 6+ icloud backup so going to test and research the AES XTS later.

horrorho commented 7 years ago

@Jirmi I think you have an old version of the ChunkListDecrypter. The latest version has been patched with a skip method to handle non-contiguous chunk blocks.

If the actual offset is behind the expected offset it moves forward past any redundant data. The opposite cannot be true, as chunk blocks do not overlap each other. The chunk info list needs to be ordered with respect to chunk offsets for this particular method to work:

    @Override
    public Set<Chunk> apply(InputStream inputStream) throws IOException {
        try {
            logger.trace("<< apply() - InputStream: {}", inputStream);
            Set<Chunk> chunks = new HashSet<>();
            List<ChunkInfo> list = container.getChunkInfoList();
            for (int i = 0, offset = 0, n = list.size(); i < n; i++) {
                ChunkInfo chunkInfo = list.get(i);
                skip(inputStream, chunkInfo.getChunkOffset() - offset);
                chunk(inputStream, chunkInfo).ifPresent(chunks::add);
                offset += chunkInfo.getChunkLength();
            }
            logger.trace(">> apply() - chunks: {}", chunks);
            return chunks;
        } finally {
            IOUtils.closeQuietly(inputStream);
        }
    }

    void skip(InputStream inputStream, int bytes) throws IOException {
        if (bytes == 0) {
            return;
        }
        if (bytes < 0) {
            throw new IOException("bad chunk offset data");
        }
        logger.debug("-- debug() - skip: 0x:{}", Integer.toHexString(bytes));
        inputStream.skip(bytes);
    }

Just ask if you need a more detailed explanation/ example.

michaljirman commented 7 years ago

@horrorho yeah, you are right! I was on non-updated repo for while so the latest patch makes perfect sense, thx 👍

michaljirman commented 7 years ago

Hi @horrorho, pls I am doing some research on changes for iOS 10 backups and wondering how do you usually get network logs if you don't have any device. Is it possible to get these logs by initiating restoration/backup through iTunes (or other desktop services/libs). I believe the original project (iloot) was created using collected network traffic (mitm) between iOS device and apple servers while device was restoring which won't be possible without jailbreak anymore due to certificate pinning. Pls correct me if I'm wrong.

horrorho commented 7 years ago

@Jirmi Hi. I was given logs way back here. I also relied heavily on iCloud for Windows for both logs and real time debugging. I use Fiddler to intercept traffic. At times I have used my house mate's iPhone to download backups with InflatableDonkey directly which can generate logging information. Unfortunately at no point have I had a jail broken iOS device to play with, which would have made things things easier.

As for pulling off logs directly from an iPhone doing it's business, I've not actually done it so I can't help very much. Sorry.

michaljirman commented 7 years ago

@horrorho np, thank you for the reply anyway! Have found some relevant information in binaries from ipws.

horrorho commented 7 years ago

@Jirmi Hi. I've patched my local copy of FuriousPotato to handle 32 bit Mach-O binaries. I'll tidy it up and push over the weekend. Just bug me if I forget to do it.

michaljirman commented 7 years ago

@horrorho Cool! Thank you!

michaljirman commented 7 years ago

Hi @horrorho, I have made some progress on reversing 2FA (not the old 2SV). Let me know when you are about to work on it and I will try my best to assist you.

horrorho commented 7 years ago

@Jirmi Heya. I'm back but I'm still tight on time.

michaljirman commented 7 years ago

Hi @horrorho, I was revisiting (just out of the curiosity) the EncryptedAttributes protobuf structure. I also noticed your comments regarding the naming of some fields in the file AssetEncryptedAttributesFactory.java. Not sure if it is any help as it doesn't make any difference to the overall functionality but I share anyway the message description below.

message EncryptedAttributes {
  optional string relativePath = 1;
  optional string domainName = 2;
  optional uint64 birth = 3;            
  optional uint64 modified = 4;        
  optional uint64 statusChanged = 5;  
  optional uint64 size = 6;
  optional int32 groupID = 7;
  optional int32 userID = 8;
  optional uint32 mode = 9;
  optional uint64 sizeBeforeCopy = 10;          
  optional bytes linkTarget = 11;
  optional bytes encryptionKey = 12;
  optional bytes sha256Signature = 13;
  optional int32 domainOrdinal = 14;
  optional int32 flags = 15;
  optional int32 contentEncodingMethod = 16;
  optional int32 contentCompressionMethod = 17;
}
horrorho commented 7 years ago

@Jirmi Thank you! Actually this is very helpful and assists me greatly. I was going to set aside some time to revisit this definition to look for the #67 switches but it looks like you're one step ahead of the game already.

michaljirman commented 7 years ago

Np, glad it helps. I was also trying to find a correct values for the compression and encoding. In iOS 11 it is usually set for sqlite databases as : encoding=1 and compression = 2

I believe I found some labels/values which are related and you can see them below but not sure if it is correct, maybe you could verify later.

contentEncodingMethod = [ 
   MBFileEncodingMethodCompressedSQLiteText = 1 = ???, 
   MBFileEncodingMethodCompressedSQLiteBinary= 2 = ???,
]
contentCompressionMethod = [ 
   MBFileCompressionMethodUnspecified =1 = ?None?, 
   MBFileCompressionMethodDefault = 2 = ?bvx2?
]
horrorho commented 7 years ago

@Jirmi That's interesting. I don't have access to any binaries at present, so you're ahead of the game once again.

I've been patching compression awareness into InflatableDonkey and the only files that are compressed are sqlite database files and they consistently compressed. They have either the .db or .sqlite file extension. The contentEncodingMethod and contentCompressionMethod are a consistent 1 and 2.

As usual most of the published information is largely marketing hype and doesn't really give much away. We can look forward to new photo and video compression methods but they'll likely upload/ download files verbatim.

Again without benefit of reversing binaries or firm evidence, I suspect Apple have a white list of compressible files such as sqlite. When these are encountered they are compressed and uploaded. Other file types are not touched. There is no compression trial as if it were.

File types such as plist are compressible but are not compressed. I guess they are generally small and comprise little of the backup volume so are not worth bothering with.

Are there other files on this whitelist? Actually does it matter? If the contentCompressionMethod is consistent that may be all we need.

More specific to lzfse is that it had three block encoding modes available: uncompressed, lvn and lzfse that are chosen based on the input size. It has four header types: bvx-, bvx1, bvx2 and bvxn. Does the contentEncodingMethod 2 refer to lzfse as a whole and any mode can be used? Or does it refer to a specific mode(lzfse) or header type(bvx2)?

I now have a completed lzfse mode only decompressor written in Java that will only handle bvx1 and bvx2 types. The code is super messy and I'm kind of ashamed to upload it until I've tidied it. I'll add in lvn compression probably next weekend now.

:octocat:

michaljirman commented 7 years ago

Hi @horrorho, I've just noticed that the KeybagManager fetches a keybag for each asset's uuid in separate request. Is it like that intentionally? (maybe threading comes in place?) Wouldn't be possible to to group the uuid's in one request? Sorry if it is dumb question.

horrorho commented 7 years ago

Heya. Not a dumb question! Normally all assets share a single key bag. Occasionally there are more but not very often. However as you've suggested it would make sense to group them together and fire off a single request for when this does occur.

That class isn't well written, so I'm at fault there.

michaljirman commented 7 years ago

Thanks! Actually, I've just realised that you are populating Set<KeyBagID> which does not keep duplicates so it wouldn't be such a problem since you said assets usually share a single key bag.

michaljirman commented 7 years ago

Hey, I'm posting below a service list in case it would be useful:-

horrorho commented 7 years ago

Thank you! Although I do have a newer Service class that already has these with a bit more information including the service id number here.

UNSPECIFIED/ UNKNOWN are internal states. Services 22 onwards are placeholders, so there are no LOBSTER_ROMANCE or FROZEN_MOOSE services.

InflatableDonkey's Service class had silly placeholders too, I wonder if anyone ported them literally.

michaljirman commented 7 years ago

Oh alright, thanks. Looks like a newer Service class was hiding somewhere. How did you get the id btw, from logs or binary? LoL, I'm sure there are some people using your placeholders :-)

horrorho commented 7 years ago

The service id numbers were reversed from binaries.

michaljirman commented 7 years ago

Thanks! Now I finally see them.

michaljirman commented 7 years ago

Hi @horrorho, wondering if you could make something clear for me please. I have been doing some testing recently. I expected that when a cloudKitToken expires it will prevent me to access any information requiring the token authentication. But, as I noticed I can still access backup informations (such as list device ids, backup ids, manifest ids, file ids) with that token. Is that actually normal behaviour or am I missing something? The only request which was prevented (HTTP 401) was the one requesting specific files by theirs file ids (req /api/client/record/retrieve with F:xxxx). If that's the normal behaviour I'm wondering if there is a way how to refresh the token without going through the whole login process.

michaljirman commented 7 years ago

I guess this article describes this situation although I would like to know your opinion anyway :-). Btw. an example of new format of token:- ATS1_xxxxxxxxxxxx-yyyyy~, ATS3_xxxxxxxxxxxx-yyyyy~ (ATS => Apple Time Stamp maybe :-) )

horrorho commented 7 years ago

Hi. Sorry for the delay, I'm somewhere far away with limited Internet connection.

Thank you for the link, it's nice to get the expert view from professionals on the matter.

Unless I'm mistaken, the primary lock down was on authorizing assets for download. Certain other functions have also been disabled, although vanilla InflatableDonkey doesn't use them and I've only hinted at what they might be. So yes, you can still perform some operations and get an idea of what data is around, you just can't get at it.

I've not examined the new CloudKit tokens so you probably know more about them than I do. That being said a timestamp is also baked into the mmeAuth token. We can generate new CloudKit tokens with an old mmeAuth token and they will work for non-backup containers, just not for backup containers.

michaljirman commented 7 years ago

Heya, just quick question please. Have you ever came across of any of these headers X-Apple-I-MD, X-Apple-I-MD-LU, X-Apple-I-MD-M? I saw them being used while going through some iCloud logs but a little bit struggling to find out the real meaning.

horrorho commented 7 years ago

@Jirmi I have seen them in old logs that were given to me. However as Windows iCloud doesn't use them I've just ignored them.

mwpcheung commented 6 years ago

@michaljirman the signature of your iDevice