wader / fq

jq for binary formats - tool, language and decoders for working with binary and text formats
bplist: NSKeyedArchiver jq function #502

dgmcdona commented 1 year ago

NSKeyedArchiver stores objects in a bplist format by flattening the object into a set of keys and values, which reference each other by index. A common example of these are the sfl2 files located in ~/Library/Application Support/com.apple.sharedfilelist. @wader proposed the following function for reconstructing these objects into a more meaningful JSON representation:

def from_ns_keyed_archiver:
  (  . as {"$objects": $objs, "$top": {root: $root_uid}}
  | def _f($id):
      ( .
      | $objs[$id]
      
      | if type == "string" then .
        elif type == "number" then .
          (. as {"$class": $class}
          | if $class == 13 then # NSDictionary?
              ( . as {"NS.keys": $ns_keys, "NS.objects": $ns_objects}
              | [$ns_keys, $ns_objects]
              | transpose
              | map(
                  ( . as [$k, $o]
                  | {key: _f($k), value: _f($o)}
              
              | from_entries
            elif $class == 58 then #?
              ( . as {"NS.objects": $ns_objects}
              | $ns_objects
              | map(_f(.))
            else "class-\($class)"

However, it was found that the class numbers are not consistent across multiple files, so relying on them for interpreting underlying types is not a general solution. The following seems to work:

def from_ns_keyed_archiver:
  (  . as {"$objects": $objs, "$top": {root: $root_uid}}
  | def _f($id):
      ( .
      | $objs[$id]
      
      | if type == "string" then .
        elif type == "number" then .
          (. as {"$class": $class}
          | .
          | if ."NS.keys" != null and ."NS.objects" != null then
              ( . as {"NS.keys": $ns_keys, "NS.objects": $ns_objects}
              | [$ns_keys, $ns_objects]
              | transpose
              | map
                  ( . as [$k, $o]
                  | {key: _f($k), value: _f($o)}
              | from_entries
            elif ."NS.objects" != null then
              ( . as {"NS.objects": $ns_objects}
              | $ns_objects
              | map(_f(.))
            else "class-\($class)"

However, we are not yet sure that this is a best practice since it is was created from a heuristic approach that is not based on any known reference documentation. More work is needed to identify the best way of identifying arrays and objects within NSKeyedArchiver representations.

wader commented 1 year ago

Good summary. Let's collect info here and figure out what to do

dgmcdona commented 1 year ago

Relevant encoding implementation: https://fuchsia.googlesource.com/third_party/swift-corelibs-foundation/+/refs/tags/swift-DEVELOPMENT-SNAPSHOT-2017-09-27-a/Foundation/NSKeyedArchiver.swift#587

dgmcdona commented 1 year ago

The class number seems to just be an index back into the object array, where the classname can be found, which is why those numbers were varying. This code seems to work:

def from_ns_keyed_archiver:
  (  . as {"$objects": $objs, "$top": {root: $root_uid}}
  | def _f($id):
      ( .
      | $objs[$id]
      | if type == "string" then .
        elif type == "number" then .
        elif type == "boolean" then .
        elif type == "null" then .
          (. as {"$class": $class}
          | if $objs[$class]."$classname" == "NSDictionary" then
              ( . as {"NS.keys": $ns_keys, "NS.objects": $ns_objects}
              | [$ns_keys, $ns_objects]
              | transpose
              | map
                  ( . as [$k, $o]
                  | {key: _f($k), value: _f($o)}
              | from_entries
            elif $objs[$class]."$classname" == "NSArray" then
              ( . as {"NS.objects": $ns_objects}
              | $ns_objects
              | map(_f(.))
            else "class-\($class)"
wader commented 1 year ago

👍 Nice! that makes sense and things much easier.

Do you think there are more NS* or other classes to support? possible to do more for the fallback case? also wonder how robust to do think this needs to be? could possibly check for keys and objects exist etc, should throw error or something else?

wader commented 1 year ago

btw github markdown supports jq :)

dgmcdona commented 1 year ago

Did some more digging through as many plist files as I could, and found a few more NS class types that are covered here:

def from_ns_keyed_archiver:
  (  . as {"$objects": $objs, "$top": {root: $root_uid}}
  | def _f($id):
      ( .
      | $objs[$id]
      | if type == "string" then .
        elif type == "number" then .
        elif type == "boolean" then .
        elif type == "null" then .
        elif type == "array" then .
          (. as {"$class": $class}
          | if $class == null then . else
            $objs[$class]."$classname" as $cname
            | if $cname == "NSDictionary" or $cname == "NSMutableDictionary" then
                ( . as {"NS.keys": $ns_keys, "NS.objects": $ns_objects}
                | [$ns_keys, $ns_objects]
                | transpose
                | map
                    ( . as [$k, $o]
                    | {key: _f($k), value: _f($o)}
                | from_entries
              elif $cname == "NSArray" 
                or $cname == "NSMutableArray" 
                or $cname == "NSSet" 
                or $cname == "NSMutableSet" then
                ( . as {"NS.objects": $ns_objects}
                | $ns_objects
                | map(_f(.))
              elif $cname == "NSData" or $cname == "NSMutableData" then ."NS.Data"
              elif $cname == "NSUUID" then ."NS.uuidbytes"
              else ."$class"=$cname # replace class ID with classname, while returning the rest of the data as-is

However, I ran into a problem with an NSKeyedArchiver file /Library/Preferences/com.apple.networkextensions.plist (contains VPN configurations, from tailscale in my case). This particular file does not have a root value in the $top object, and none of the items in the $objects array seem to be the root. Very confusing.

dgmcdona commented 1 year ago

I'm thinking it might be a good idea to name it something like from_ns_keyed_archiver_root, although that's getting to be a bit long. But we're not going to be able to reliably decode anything that doesn't have a root value.

wader commented 1 year ago

Nice progress. Are you able to share com.apple.networkextensions.plist or maybe sensitive?

Will have a deeper look more later day

wader commented 1 year ago

Cleaned up fix the style a bit to match the one used in fq, there was some destructing bindings that was only used once anyway, removed those, also added some TODOs for cases to maybe clarify.

def from_ns_keyed_archiver:
  (  . as {
      "$objects": $objects,
      "$top": {root: $root}
  | def _f($id):
      ( $objects[$id]
      | type as $type
      | if $type |
          . == "string"
          or . == "number"
          or . == "boolean"
          or . == "null" then .
        elif $type == "array" then . # TODO: does this happen?
          ( ."$class" as $class
          | if $class == null then . # TODO: what case is this?
              ( $objects[$class]."$classname" as $cname
              | if $cname == "NSDictionary"
                  or $cname == "NSMutableDictionary" then
                  # transform arrays [key_id1, key_id2,...] and [obj_id1, obj_id2,..] into {key: obj, ...}
                  ( [."NS.keys", ."NS.objects"]
                  | transpose
                  | map({key: _f(.[0]), value: _f(.[1])})
                  | from_entries
                elif $cname == "NSArray"
                  or $cname == "NSMutableArray"
                  or $cname == "NSSet"
                  or $cname == "NSMutableSet" then
                  ( ."NS.objects"
                  | map(_f(.))
                elif $cname == "NSData" or $cname == "NSMutableData" then ."NS.Data" # TODO: will be a json string?
                elif $cname == "NSUUID" then ."NS.uuidbytes" # TODO: will be a json string?
                  # replace class ID with classname, while returning the rest of the data as-is
                  ."$class " = $cname

If it's hard to follow transformation code like i sometimes add a snippet above it of how the input looks, maybe good idea?

# {
#   "$archiver": "NSKeyedArchiver",
#   "$objects": [
#     "$null",
#     {
#       "$class": 12,
#       "NS.keys": [
#         2,
#         3
#       ],
#       "NS.objects": [
#         4,
#         32
#       ]
#     },
# ...
#     {
#       "$classes": [
#         "NSDictionary",
#         "NSObject"
#       ],
#       "$classname": "NSDictionary"
#     },
# ...
#   ],
#   "$top": {
#     "root": 1
#   },
#   "$version": 100000
# }

Also this might be a good snippet to expand bookmarks:

$ fq -L . 'include "ns_keyed_archiver"; torepr | from_ns_keyed_archiver | (.. | .Bookmark? // empty) |= apple_bookmark' ...

(.. | .Bookmark? // empty) will recurse and output all value that it succeeds to index into, which will produce nulls when missing, the // takes care of that, it evals it right side if left side is empty of false-ish (null and false)).

Some things to figure out:

dgmcdona commented 1 year ago

plist.zip Here's the file in question, I sanitized the data

dgmcdona commented 1 year ago

One more thing to deal with in this one: It looks like every dictionary value that is a number is a reference to an object from the original array, if I'm reading things correctly.

dgmcdona commented 1 year ago

I think com.apple.networkextension.plist is an encoding of 3 objects. Possible strategy:

wader commented 1 year ago

Thanks, that is a bit strange. I wonder if it could be that network extension has classes that use their own custom serializers somehow? i found this https://github.com/Chr0nicT/macOS-Headers-10.14.6-Mojave/blob/master/Frameworks/NetworkExtension/1/NEConfiguration.h which seems to indicate as you say that the number are classes but sometimes they are just numbers also? seems hard to have some generic heuristic for that?

Here is version that treat the UUID in $top as root and also recurses and stops at cycles:

def from_ns_keyed_archiver:
  (  . as {
      "$objects": $objects,
      # "$top": {root: $root}
      "$top": {"796BFF22-6712-4486-A32C-A1C5DB3273BA": $root}
  | def _f($id; $seen_ids):
      def _r($id):
        if $seen_ids | has("\($id)") then "cycle-\($id)"
        else _f($id; $seen_ids | ."\($id)" = true)
      ( $objects[$id]
      | .
      | type as $type
      | if $type |
          . == "string"
          or . == "number"
          or . == "boolean"
          or . == "null" then .
        elif $type == "array" then . # TODO: does this happen?
          ( ."$class" as $class
          | if $class == null then . # TODO: what case is this?
              ( $objects[$class]."$classname" as $cname
              | if $cname == "NSDictionary"
                  or $cname == "NSMutableDictionary" then
                  # transform arrays [key_id1, key_id2,...] and [obj_id1, obj_id2,..] into {key: obj, ...}
                  ( [."NS.keys", ."NS.objects"]
                  | transpose
                  | map({key: _r(.[0]), value: _r(.[1])})
                  | from_entries
                elif $cname == "NSArray"
                  or $cname == "NSMutableArray"
                  or $cname == "NSSet"
                  or $cname == "NSMutableSet" then
                  ( ."NS.objects"
                  | map(_r(.))
                elif $cname == "NSData" or $cname == "NSMutableData" then ."NS.Data" # TODO: will be a json string?
                elif $cname == "NSUUID" then ."NS.uuidbytes" # TODO: will be a json string?
                elif $cname == "NEConfiguration" then
                    .value |= _r(.)
                  # replace class ID with classname, while returning the rest of the data as-is
                  ."$class" = $cname
    def _f($id): _f($id; {"\($id)": true});

Then i get this:

  "$class": {
    "$classes": [
    "$classname": "NEConfiguration"
  "AlwaysOnVPN": "$null",
  "AppPush": "$null",
  "AppVPN": "$null",
  "Application": "io.tailscale.ipn.macsys",
  "ApplicationName": "Tailscale",
  "ContentFilter": "$null",
  "DNSProxy": "$null",
  "DNSSettings": "$null",
  "ExternalIdentifierString": "$null",
  "Grade": "cycle-1",
  "Identifier": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd,\ufffd\ufffd\ufffd2s\ufffd",
  "Name": "Tailscale Tunnel",
  "PathController": "$null",
  "ProfileInfo": "$null",
  "VPN": {
    "$class": "NEVPN",
    "DisconnectOnDemandEnabled": false,
    "Enabled": true,
    "ExceptionApps": 0,
    "OnDemandEnabled": false,
    "OnDemandRules": 0,
    "OnDemandUserOverrideDisabled": false,
    "Protocol": 6,
    "TunnelType": 1

"Grade" is a long long so that cycle is a bogus i guess.

(the reason $seen_ids uses strings as keys is just that json only allow string keys)

dgmcdona commented 1 year ago

I don't think we're going to be able to create a general enough function for NSKeyedArchiver objects that aren't of the standard $top.root type because of the reference vs. integer problem. In your output above, "Protocol": 6, is pretty clearly a reference, but "TunnelType": 1, if treated as a reference, would point to the top level object which would create infinite recursion, and we don't really have a way to make that decision accurately right now.

wader commented 1 year ago

Yeap i think your right and you know more how i will be used in practice. The only more idea i have is to have an optional lambda argument that would be called in the fallback case, but maybe not worth it?

So i guess left is to cleanup it up a bit, decide on name and if to include in fq or not? have made any progress on the forensic fq idea?

BTW are xml plists of interest also? are they used as NSKeyedArchiver also? there is start of an xml plist to json function in the fq wiki.

dgmcdona commented 1 year ago

I'm not sure if there are XML NSKeyedArchiver files, but I'll keep an eye out next time I get to digging around.

I think I found a solution to the problem we were facing: we had lost useful type information in the bplist torepr function: uid types are getting converted to integers, and they can help us identify references since that type seems to be used explicitly for that purpose. I made some changes to the bplist implementation:

diff --git a/format/bplist/bplist.jq b/format/bplist/bplist.jq
index 22551d77..0656dddf 100644
--- a/format/bplist/bplist.jq
+++ b/format/bplist/bplist.jq
@@ -7,7 +7,7 @@ def _bplist_torepr:
       elif .type == "data" then .value | tovalue
       elif .type == "ascii_string" then .value | tovalue
       elif .type == "unicode_string" then .value | tovalue
-      elif .type == "uid" then .value | tovalue
+      elif .type == "uid" then .value | tovalue | tostring | ["cfuid-", .] | join("")
       elif .type == "array" then
         ( .entries
         | map(_f)

And changed your function above to account for this (I'm sure it needs some cleanup but it seems to be working):

def from_ns_keyed_archiver:
  (  . as {
      "$objects": $objects,
      # "$top": {root: $root}
      "$top": {"796BFF22-6712-4486-A32C-A1C5DB3273BA": $root}
  | def _try_parse_uid($uidstr):
      if $uidstr | startswith("cfuid-") then
        $uidstr | match("[0-9]+", "l") | .string | tonumber else null end;
    def _f($id; $seen_ids):
      def _r($id):
        if $seen_ids | has("\($id)") then "cycle-\($id)"
        else _f($id; $seen_ids | ."\($id)" = true)
      ( $objects[_try_parse_uid($id)]
      | .
      | type as $type |
        if $type == "string" and . == "$null" then null
        elif $type == "string" and _try_parse_uid(.) then _r(_try_parse_uid(.))
        elif $type |
          . == "number"
          or . == "boolean"
          or . == "null" then .
        elif $type == "array" then . # TODO: does this happen?
        elif $type == "object" then
          ( ."$class" as $class
          | if $class == null then # TODO: what case is this?
              .value |= _r(.)
              _try_parse_uid($class) as $uid |
              ( $objects[$uid]."$classname" as $cname
              |
              | if $cname == "NSDictionary"
                  or $cname == "NSMutableDictionary" then
                  # transform arrays [key_id1, key_id2,...] and [obj_id1, obj_id2,..] into {key: obj, ...}
                  ( [."NS.keys", ."NS.objects"]
                  |
                  | transpose
                  |
                  | map({key: _r(.[0]), value: _r(.[1])})
                  | from_entries
                elif $cname == "NSArray"
                  or $cname == "NSMutableArray"
                  or $cname == "NSSet"
                  or $cname == "NSMutableSet" then
                  ( ."NS.objects"
                  | map(_r(.))
                elif $cname == "NSData" or $cname == "NSMutableData" then ."NS.Data" # TODO: will be a json string?
                elif $cname == "NSUUID" then ."NS.uuidbytes" # TODO: will be a json string?
                  # replace class ID with classname, while returning the rest of the data as-is
                  ."$class" = $cname |
                    if (.value | type) == "string" and _try_parse_uid(.value) then .value |= _r(.) end
    def _f($id): _f($id; {"\($id)": true});

Which produces the following output for com.apple.networkextension.plist:

  "$class": "NEConfiguration",
  "AlwaysOnVPN": null,
  "AppPush": null,
  "AppVPN": null,
  "Application": "io.tailscale.ipn.macsys",
  "ApplicationName": "Tailscale",
  "ContentFilter": null,
  "DNSProxy": null,
  "DNSSettings": null,
  "ExternalIdentifierString": null,
  "Grade": 1,
  "Identifier": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd,\ufffd\ufffd\ufffd2s\ufffd",
  "Name": "Tailscale Tunnel",
  "PathController": null,
  "ProfileInfo": null,
  "VPN": {
    "$class": "NEVPN",
    "DisconnectOnDemandEnabled": false,
    "Enabled": true,
    "ExceptionApps": null,
    "OnDemandEnabled": false,
    "OnDemandRules": null,
    "OnDemandUserOverrideDisabled": false,
    "Protocol": {
      "$class": "NETunnelProviderProtocol",
      "AuthenticationMethod": 0,
      "AuthenticationPluginType": null,
      "DNSSettings": null,
      "DesignatedRequirement": "anchor apple generic and identifier \"io.tailscale.ipn.macsys.network-extension\" and (certificate leaf[field.1.2.2222222222.] /* exists */ or certificate 1[field.1.2.2222222222.] /* exists */ and certificate leaf[field.1.2.2222222222.] /* exists */ and certificate leaf[subject.OU] = 2222222222)",
      "DisconnectOnIdle": false,
      "DisconnectOnIdleTimeout": 0,
      "DisconnectOnLogoutKey": false,
      "DisconnectOnSleep": false,
      "DisconnectOnUserSwitch": false,
      "DisconnectOnWake": false,
      "DisconnectOnWakeTimeout": 0,
      "EnforceRoutes": false,
      "ExcludeLocalNetworks": false,
      "Identifier": "꽦\ufffd\ufffd\ufffdL\ufffd\ufffd\u000f\ufffd\u0005\ufffd\ufffd\u001aq",
      "Identity": null,
      "IdentityData": null,
      "IdentityDataHash": null,
      "IdentityDataImported": false,
      "IdentityDataPassword": null,
      "IdentityDataPasswordKeychainItem": null,
      "IncludeAllNetworks": false,
      "NEProviderBundleIdentifier": "io.tailscale.ipn.macsys.network-extension",
      "Password": null,
      "PasswordEncryption": null,
      "PasswordReference": null,
      "PluginType": "io.tailscale.ipn.macsys",
      "ProxySettings": null,
      "ReassertTimeout": 0,
      "ServerAddress": "Tailscale Mesh",
      "Type": 4,
      "Username": null,
      "VendorConfiguration": null,
      "VendorInfo": null
    "TunnelType": 1
dgmcdona commented 1 year ago

It would be better to create an object than doing the funky string concatenation and parsing, I’ll fix that up later.

wader commented 1 year ago

I'm not sure if there are XML NSKeyedArchiver files, but I'll keep an eye out next time I get to digging around.


I think I found a solution to the problem we were facing: we had lost useful type information in the bplist torepr function: uid types are getting converted to integers, and they can help us identify references since that type seems to be used explicitly for that purpose. I made some changes to the bplist implementation:

Oh good catch! nice. String interpolation can be nice for this ... | tovalue | "cfuid-\(.)" but i agree an object is probably better.

dgmcdona commented 1 year ago

I'd be down to keep this in the fq repo if that's okay, don't really have a lot of other functions in mind off the top of my head. Where can we put it?

wader commented 1 year ago

Ok let's put in fq. Maybe a "macos" package could make sense? move bplist and apple_bookmark there? maybe even move the macho decoder? otherwise a "plist" package but would apple_bookmark fit? the structure under format/ is not very strict and should be no problem moving things around later. Any ideas?