helloSystem / Filer

A file manager that can also render the desktop
GNU General Public License v2.0
30 stars 9 forks source link

Interpret `.DS_Store` files #153

Open probonopd opened 1 year ago

probonopd commented 1 year ago

We would like to read .DS_Store files (as known from the Mac) in our C++ application, Filer.

http://kaitai.io/ can generate code in various programming languages (including C++) to parse various file formats. For C++, there is https://github.com/kaitai-io/kaitai_struct_cpp_stl_runtime. https://formats.kaitai.io/ds_store/ has a format specification for the .DS_Store file format.

Can we turn this into working C++ code that we could use in Filer to read .DS_Store files?

Unofortunately the C++ code is missing on the page for this particular file format, and trying to compile it with https://github.com/kaitai-io/kaitai_struct_compiler throws an error:

FreeBSD% curl -LO https://github.com/kaitai-io/kaitai_struct_compiler/releases/download/0.10/kaitai-struct-compiler_0.10_all.deb
FreeBSD% mkdir -p kaitai
FreeBSD% dpkg-deb -x kaitai-struct-compiler_0.10_all.deb kaitai  

FreeBSD% ./kaitai/usr/share/kaitai-struct-compiler/bin/kaitai-struct-compiler -t cpp_stl dsinfo.ksy
ds_store: /:
        error: AnyType (of class io.kaitai.struct.datatype.DataType$AnyType$)

dsinfo.ksy: /types/buddy_allocator_body/seq/3/id:
        warning: use `num_directory_entries` instead of `num_directories`, given that it's only used as repeat count of `directory_entries` (see https://doc.kaitai.io/ksy_style_guide.html#attr-id)

dsinfo.ksy: /types/buddy_allocator_body/types/free_list/seq/0/id:
        warning: use `num_offsets` instead of `counter`, given that it's only used as repeat count of `offsets` (see https://doc.kaitai.io/ksy_style_guide.html#attr-id)

dsinfo.ksy: /types/block/seq/1/id:
        warning: use `num_data` instead of `counter`, given that it's only used as repeat count of `data` (see https://doc.kaitai.io/ksy_style_guide.html#attr-id)

dsinfo.ksy: /types/block/types/block_data/types/record/types/record_blob/seq/0/id:
        warning: use `len_value` instead of `length`, given that it's only used as a byte size of `value` (see https://doc.kaitai.io/ksy_style_guide.html#attr-id)

Related to:

Reference:

generalmimon commented 1 year ago

@probonopd:

Unofortunately the C++ code is missing on the page for this particular file format, and trying to compile it with https://github.com/kaitai-io/kaitai_struct_compiler throws an error:

(...)
FreeBSD% ./kaitai/usr/share/kaitai-struct-compiler/bin/kaitai-struct-compiler -t cpp_stl dsinfo.ksy
ds_store: /:
        error: AnyType (of class io.kaitai.struct.datatype.DataType$AnyType$)

The AnyType error (sorry for this cryptic uncontrolled exception; it's actually scala.MatchError: AnyType, but the error reporting code filters the type out) refers to this part of ds_store.ksy (ds_store.ksy:185-196):

              - id: value
                type:
                  switch-on: data_type
                  cases:
                    '"long"': u4
                    '"shor"': u4
                    '"bool"': u1
                    '"blob"': record_blob
                    '"type"': four_char_code
                    '"ustr"': ustr
                    '"comp"': u8
                    '"dutc"': u8

kaitai-struct-compiler is currently unable to compile this into C++ in particular, because it has to derive what the combined type of value field should be declared in C++ so that it can hold any of these possible type variants, but no common type that fit both primitive integer types (u4, u8) and user-defined types (i.e. classes derived from kaitai::kstruct) is known for C++ and the compiler is unable to proceed.

This combined type is internally referred to as AnyType and some other strongly typed languages like Java, C# and Go can use Object/object/interface{} here. However, we currently target C++98 and C++11 (if you use the --cpp-standard 11 CLI option) using just the standard library (i.e. no Boost or anything like that), so std::any is not an option, since it's only available since C++17.

A workaround for now is to ensure that primitive types (u4) and user types do not meet in the same switch; you can for example wrap all primitive types into custom user-defined types:

              - id: value
                type:
                  switch-on: data_type
                  cases:
                    '"long"': u4_wrapper
                    '"shor"': u4_wrapper
                    '"bool"': bool_wrapper
                    '"blob"': record_blob
                    '"type"': four_char_code
                    '"ustr"': ustr
                    '"comp"': u8_wrapper
                    '"dutc"': u8_wrapper
            types:
              u4_wrapper:
                seq:
                  - id: value
                    type: u4
              u8_wrapper:
                seq:
                  - id: value
                    type: u8
              bool_wrapper:
                seq:
                  - id: raw
                    type: u1
                instances:
                  value:
                    value: raw != 0

Then the compiler will able to decide that the value can be of type kaitai::kstruct (common class for all user-defined types in .ksy specifications) and there will be no problem with this.