Visualize more timestamp formats

GreyCat commented 6 years ago

It turns out that there are many more timestamp formats in the wild besides UNIX timestamp. It would be cool to show more of them in "converter" to aid timestamp detection guesswork.

Timestamps

Continuous interval measurements

Unix timestamp
- Size: u4le/be
- Definition: seconds since 1970-01-01 00:00 UTC
Microsoft FILETIME
- AKA:
- Windows NT time format
- NTFS file time
- LDAP time
- Active Directory time
- Size: u8le
- Definition: 100-nanoseconds intervals since 1601-01-01 00:00 UTC
WebKit/Chrome timestamp
- Size: u8 (?)
- Definition: microseconds since 1601-01-01 00:00 UTC
Apple Cocoa Core Data timestamp
- AKA:
- Mac absolute time
- Size: ?
- Definition: seconds since 2001-01-01 00:00 UTC
Apple Mac OS X HFS+ timestamp
- AKA:
- Mac timestamp
- Definition: seconds since 1904-01-01 00:00 UTC
Seconds since year 0
- AKA:
- MySQL time
- Size: u8 (?)
- Definition: seconds since
Excel timestamp
- AKA:
- Microsoft timestamp
- Size: f8
- Definition: days since 1899-12-31

Complex structures

Microsoft FAT date time
- Size: 4 byte packed structure = 2 bytes for date + 2 bytes time
- Timezone dependent
- Resolution: 2 sec
Microsoft SYSTEMTIME
- Size: 16 byte structure
- Definition: individual fields for components
- Timezone resolution: ?
- Resolution: ?
ISO9660 decimal timestamp
- Size: 17 bytes structure
- Definition: individual fields for components, mostly ASCII strings
- Resolution: 0.01 sec
- Timezone resolution: 15 minutes
ISO9660 binary timestamp
- Size: 7 bytes structure
- Definition: individual fields for components
- Resolution: 1 sec
- Timezone resolution: 15 minutes

sem-geologist commented 6 years ago

please, Windows FILETIME (u64le of microseconds from 1601-01-01 00:00).

sem-geologist commented 6 years ago

there goes MS FILETIME:

...
var u64le = Converter.numConv(data, 8, false, false);
this.msfiletime = u64le ? dateFormat(new Date(parseInt(u64le)/10000 - 11644473600000), "yyyy-mm-dd HH:MM:ss") : "";
...

number 11644473600000 above is the difference of unix and MS filetime epoch (1970-01-01T00:00:00 - 1601-01-01T00:00:00) in milliseconds. I think similar trick can be used for Mac and Windows Float type timestamps: http://www.silisoftware.com/tools/date.php

KOLANICH commented 6 years ago

I guess converter should not have hardcoded formats, but should have a setting to try arbitrary ksy descriptions against the fields of suitable size.

sem-geologist commented 6 years ago

1) there are finite number of datestamps andf ortunately/unfortunately they are very commonly used in the binary formats. 2) I see no posibilty of conversion binary datastamp in language-agnost way using ksy-alone into human readable datetime format. While technically it is possible, it would be not efficient compared to usig builtin datestamp conversion language specific methods. Our planets revolution around the sun is quite messed... unless timestamp will get same love as zlib in ksy. That needs some research and erfort to work across different langueges consistently. Unfortunately languages treat datetime types with different precissions, which begs to ask if exact implementation should be not left for particular user case. In the end, whole tab is very helpful when RE. the clutter could be managed by options, pull down selection: eg. only one endiannes (le or be); only one timestamp format. when there would be some place for eventual ksy defined types too.

GreyCat commented 6 years ago

There might be some connection of this and https://github.com/kaitai-io/kaitai_struct/issues/188, but in general, there's little connection with ksy specs so far. Many timestamps are just integers that specify number of (seconds|microseconds|nanoseconds|etc) that passed since some arbitrary date.

GreyCat commented 6 years ago

I've posted a list of timestamps info I've collected, please take a look.

sem-geologist commented 6 years ago

wow, that's an extensive list!

arekbulski commented 6 years ago

It might be a good idea to have a flexible type (custom epoch and resolution), like:

type: timestamp
timestamp-type: u4
unit: 1 for seconds, 10**-6 for nanoseconds, a float value
epoch: 1970 for unix epoch, an integer value

KOLANICH commented 6 years ago

IMHO 1 we should not spoil syntax with everything possible. 2 every machine-readable date/time format can be converted into an integer ... 3 ... or parsed from an integer 4 instances and params are all what we need

so, 5 we don't need any additional syntax for timestamps, timestamp is just an uint offsetted to a standardized offset, I think we should stick to a single standard 6 and we don't even need additional types: we already have uints, I guess hints should be enough in for display purposes, but for another ones not covered we can implement the computation right in KS and put the type into the stdlib.

GreyCat commented 6 years ago

I thought of trying to add a "timestamp" / "datetime" data type, either abstract or concrete, and so far it looks like problems outweight potential benefits by a large margin.

For starters, quite a few languages lack definitive timestamp data types at all. Sometimes, there are several competing standards. Those who have such data types, are quite often restricted by a particular implementation:

range
precision
with / without timezone
how it works with ancient dates (i.e. pre-Gregorian calendar approximations)
ability to represent stuff like 1970-00-01 or 1970-01-00 or 1970-12-32

So, typically, at best we can do some approximation, yet lots of application require exact precision (for reproducibility, in-domain calculations, etc).

Probably this could be mediated by creation of something like "Kaitai DateTime", which would offer the same exact interface, built-in support for lots of possible time formats, and an ability to export into target language's timestamp formats, but this also has its cons and is obviously a huge undertaking.

arekbulski commented 6 years ago

I am more or less familiar with those issues. But that should not be a reason to not add it. After all, adding a type does not restrict the users but expands their capability. If they dont want to use the corresponding target type (due to multiple possible reasons) they can just parse Int32 and thats it. Having multiple possible implementations, its better to offer one than none. Languages that dont have any, like C, can just parse the int as well.

Construct has a flexible format, Timestamp works by parsing chosen subcon (Int32ul for example) then interprets it as a total seconds/microseconds/nanoseconds (chosen) since epoch (chosen). Of course the Arrow module might not support some values (subcon range outside of Arrow range), floating-point errors, etc. But it works with reasonable dates and reasonable precision.

KOLANICH commented 6 years ago

timestamp-type: u4

is not a new type, it's new syntax

I expect using timestamps as

seq:
  - id: ts_
    type: s8
instances:
  ts:
    type: timestamp(ts_, ...other parameters needed)

note that we don't have typed instances for now

GreyCat commented 6 years ago

Construct has a flexible format, Timestamp works by parsing chosen subcon (Int32ul for example) then interprets it as a total seconds/microseconds/nanoseconds (chosen) since epoch (chosen). Of course the Arrow module might not support some values (subcon range outside of Arrow range), floating-point errors, etc.

I don't think that Construct's Timestamp is a good idea to copy for several reasons:

It supports only single integer-based timestamps, i.e. no per-component timestamps, no approximate timestamps, etc.
It depends on one particular non-standard library; most languages lack something similar, and even for Python folks, I'm not 100% sure that everyone would be happy with adding Arrow to their project
Instead of decoupling time stamp definition into some kind of library, you're forced to repeat that "epoch = 1970, precision = 1s", or "epoch = 1601, precision = 100ns" every time. A single typo (like typing "1600" instead of "1601") or something, and, boom, you've got a weird parsing error that would be very hard to find, as only one of possible hundreds of definitions in a format would be slightly off in some cases.

But it works with reasonable dates and reasonable precision.

The problem is that in binary formats world, people tend to use bizzare magical values and stuff like that. For example, "all 0s" and "all 1s" are popular choices that you quite often need to be able to represent exactly, as they have some special meaning like "time unknown", "this entity is void" or something like that. For these, you can't go with approximates.

speedstyle commented 1 year ago

Would it be possible to make this a -webide-representation feature? I see the issue with putting it in the parser – better to have user code handle it – but most of the above formats could be quite easily displayed as ISO or localized strings. Having the tag in the ksy could also help document the raw type. Something like

{unix:timestamp=s/1970-01-01} (or ms etc)
{msfiletime:timestamp=100ns/1601-01-01}
{webkit:timestamp=us/1601} (iso8601 doesn't require -MM-DD)
{cocoa:timestamp=s/2001}
{hfs:timestamp=s/1904}
{mysql:timestamp=s/0000}
{excel:timestamp=day/1899-12-31}

day could be given as 86400s, s as 1s or even 1us as 0.000001(s) to simplify implementation.

I don't see the point in it for fields of separate components, which can already be formatted and viewed easily.

FAT

```yaml # size: 4 # bit-endian: le - id: secs type: b5 - id: mins type: b6 - id: hours type: b5 - id: day type: b5 - id: month type: b4 - id: year type: b7 ```

SYSTEMTIME

```yaml # all with type: u4 - id: year - id: month - id: weekday - id: day - id: hour - id: min - id: sec - id: milli ```

ISO9660

```yaml # encoding: ASCII - id: year size: 4 type: str - id: month size: 2 type: str - id: day size: 2 type: str - id: hour size: 2 type: str - id: min size: 2 type: str - id: sec size: 2 type: str - id: centi size: 2 type: str - id: utc_offset type: s1 ```

kaitai-io / kaitai_struct_webide