kaitai-io / kaitai_struct_formats

Kaitai Struct: library of binary file formats (.ksy)
http://formats.kaitai.io
712 stars 203 forks source link

zip: dates 00-00-1980 are (in)valid? #562

Open armijnhemel opened 3 years ago

armijnhemel commented 3 years ago

I am not sure whether or not to file this one against zip.ksy or /common/dos_datetime.ksy, so perhaps it needs to be moved.

I am currently looking at the contents of a ZIP-file inside a Chrome extension. In zip.ksy the standard MS-DOS format is used for the date in (amongst others) the local file header. In the example I am looking at ( https://github.com/GoogleChrome/chrome-extensions-samples/blob/main/mv2-archive/api/permissions/extension-questions.crx ) I can see the following:

  Length      Date    Time    Name
---------  ---------- -----   ----
        0  00-00-1980 00:00   images/
      476  00-00-1980 00:00   manifest.json
      312  00-00-1980 00:00   options.html
     1345  00-00-1980 00:00   options.js
      177  00-00-1980 00:00   popup.html
     1424  00-00-1980 00:00   popup.js
     7008  00-00-1980 00:00   images/icon.png
---------                     -------
    10742                     7 files

In /common/dos_datetime.ksy there are these sanity checks:

  date:
    -webide-representation: "{padded_year}-{padded_month}-{padded_day}"
    seq:
      - id: day
        type: b5
        valid:
          min: 1
      - id: month
        type: b4
        valid:
          min: 1
          max: 12

and that obviously fails.

There are three solutions:

  1. reject this file as invalid - I think this would be incorrect
  2. drop the sanity checks in /common/dos_datetime.ksy - perhaps this isn't correct either
  3. do not use dos_datetime in the ZIP specification - not ideal, but perhaps the best
KOLANICH commented 3 years ago

I guess

drop the sanity checks in /common/dos_datetime.ksy

and create dos_datetime_validated.ksy maybe?

dgelessus commented 3 years ago

By the way, DOS itself and Windows 95 "support" zeroed out datetimes on FAT disks, to some extent. If a file has an all-zero modification time, dir and Explorer show no time at all instead of an invalid one like "0/0/1980 00:00 AM".

All-zero modification time in Windows 95

When listing ZIP files or FAT disk images, 7-Zip behaves similarly and also displays all-zero DOS datetimes as blank fields.

Since there are modern ZIP files that use these zeroed-out datetimes, and at least some popular tools can handle this case, I think we shouldn't block this entirely. Maybe we should remove the valid checks and instead add something like an is_valid instance to check if the datetime values make sense? Or maybe we should add a special case to allow only all-zero datetimes and still block all other invalid date/time combinations.

KOLANICH commented 3 years ago

Or maybe we should add a special case to allow only all-zero datetimes and still block all other invalid date/time combinations.

Could you also test OS reaction on tampered fs with components not entirely zeroed out?

By the way, DOS itself and Windows 95

How about more modern OSes?

generalmimon commented 2 years ago

@KOLANICH:

Or maybe we should add a special case to allow only all-zero datetimes and still block all other invalid date/time combinations.

Could you also test OS reaction on tampered fs with components not entirely zeroed out?

I've created a few ZIP files to test this:

1981-02-28 01:01:02  valid (used as a basis for other datetimes)

1981-02-00 01:01:02 day too small (min 1)
1981-02-29 01:01:02 day too large (1981 is not a leap year)
1980-02-29 01:01:02 valid (1980 is a leap year)

1981-00-28 01:01:02 month too small (min 1)
1981-13-28 01:01:02 month too large (max 12)

1981-02-28 01:01:60 second too large (max 58)
1981-02-28 01:60:02 minute too large (max 59)
1981-02-28 24:01:02 hour too large (max 23)

See the attached file zip-datetime-samples.tar.gz.

Looking at the created archives in Windows Explorer and 7-Zip, apparently any invalid datetime behaves the same as the zeroed out. So the last modified date is only displayed if it's valid, otherwise it's hidden.

From this standpoint, I think that there isn't anything special about the "datetime" 00-00-1980 00:00:00 - it's just an invalid date (due to day and month being less than 1, otherwise it would be valid) like any other. Applications capable of reading ZIP archives only decided that they don't want to refuse to show an archive just because it contains an invalid date, and deal with it by hiding just the invalid date while the rest of the archive remains accessible.

So this should be an ideal behavior of the KS implementation as well. Failure to meet the constraint set by the valid key (https://github.com/kaitai-io/kaitai_struct/issues/435) is currently always fatal, so we unfortunately can't use it. So this option:

Maybe we should remove the valid checks and instead add something like an is_valid instance to check if the datetime values make sense?

is the next best thing, and I would totally go for this option for now.


However, I feel that it's not ideal - in comparison to valid, any information about "what field is invalid and how it is invalid" is lost (which is fine for most applications, but still - valid provides it), and it's easy to neglect checking the is_valid attribute (a user writing an application using the KS parser might not even know there is such attribute, let alone that they are supposed to check its value in their application code).

Therefore, I think that the valid proposal (https://github.com/kaitai-io/kaitai_struct/issues/435) should be extended to include an option to make some valid failure non-fatal but still requiring the user to handle it somehow. If they want to log it as a warning, it'll be simple. If they choose to ignore it, fine, but at least it will be their informed and deliberate decision. I'm not sure how the API for that should look like, but I think such feature would make sense.

Moreover, it wouldn't be useful only for valid failures, but for other failures as well - for example in https://github.com/kaitai-io/kaitai_struct/issues/183, there is a idea to produce a warning when a the runtime library detects a situation where a 64-bit integer reading will lead to a loss of precision (JavaScript stores all numbers as doubles, so any integer that would require more than 53 significant bits in order to be represented accurately (i.e. anything above 2^53 - 1 or below -(2^53 - 1)) will be approximated). This might be a non-fatal issue if the value does not have to be 100% precise, but perhaps there's no point in continuing the parsing when such situation happens and the issue is fatal - the user should be able to decide how to handle it.

armijnhemel commented 1 year ago

Just noting for reference: the ZIP specification also says in section 4.4.6 that there are certain situations where this field will only contain zeroes.