tweaselORG / cyanoacrylate

Toolkit for large-scale automated traffic analysis of mobile apps on Android and iOS.
MIT License
5 stars 1 forks source link

Add additional metadata to HAR #39

Closed baltpeter closed 1 month ago

baltpeter commented 3 months ago

For our complaint generator, in addition to the mere traffic logs, we also need additional metadata like the OS version, details about the app, and the versions of the tools used (https://github.com/tweaselORG/complaint-generator/issues/4).

Especially if we consider that we won't be the only ones generating reports etc. based on analyses done with tweasel tools, we should definitely already compile all this metadata in CA, so that the user doesn't have to do it themselves.

And while we could have a separate metadata file output for this, I quite like the idea of just embedding it into our exported HAR files. Having a single file that contains all relevant data is a lot more convenient and less error-prone.

The HAR spec supports custom fields, so that isn't an issue, either (http://www.softwareishard.com/blog/har-12-spec/, section "Custom fields").

baltpeter commented 3 months ago

Okay, let's go over what we want to include. We'll start with the app meta.

The complaints currently need:

type App = {
    id: string;
    name: string;
    version: string;
    url: string;
    store: 'Google Play Store' | 'Apple App Store';

    platform: 'Android' | 'iOS';
};

While in appstraction and CA, you get:

type AppMeta = {
    id: string;
    name?: string | undefined;
    version?: string | undefined;
    versionCode?: string | undefined;
    architectures: ("arm64" | "arm" | "x86" | "x86_64" | "mips" | "mips64")[];
}

Most information is already there. Notably, we're missing the app's source (URL and store). Maybe there's a way to extract that from the APK/IPA?

baltpeter commented 3 months ago

Oh, appstraction currently only supports getting app meta for app files, not for apps already on the device. For our use case, I don't think that'll matter much, since we'll always be freshly installing the apps. But we may want to support others' use cases, as well.

Tracked in https://github.com/tweaselORG/appstraction/issues/127.

baltpeter commented 3 months ago

Investigating whether we can get the app source in https://github.com/tweaselORG/appstraction/issues/128.

mal-tee commented 3 months ago
baltpeter commented 3 months ago

sha256 hash of the app file

I had considered that but does that actually make sense? What purpose would this information serve? The only use case I can think of would be to verify that an "unmodified" app was analyzed.

But as far as I understand it, the actual app files that are delivered by the stores are customized to the device and what's already installed on it. At least on iOS, IPAs from the App Store even contain information about the account that downloaded the app.

And doesn't that break this use case? If someone else will most likely end up with a different file for the same app (+version), wouldn't this just lead to confusion?

Or am I missing another use case/misunderstanding app delivery?

baltpeter commented 3 months ago

Oh, and also: We support split APKs and OBBs. How would we determine the hash in these cases?

baltpeter commented 3 months ago

If the use case is to make sure that the analysis was performed on an unmodified app, I've been looking into extracting the signature (by the developer or store): Wouldn't that also suffice/be even better?

https://github.com/tweaselORG/appstraction/issues/128#issuecomment-2053554877

mal-tee commented 3 months ago

I'd just store both. This way you have a chance to check which APK was analyzed when the signature check fails.

zner0L commented 2 months ago

The command to get comprehensive device information including the processor architecture on iOS is:

pymobiledevice3 lockdown info

This returns a JSON containing a property CPUArchitecture.