Open GreyCat opened 6 years ago
please, Windows FILETIME (u64le of microseconds from 1601-01-01 00:00).
there goes MS FILETIME:
...
var u64le = Converter.numConv(data, 8, false, false);
this.msfiletime = u64le ? dateFormat(new Date(parseInt(u64le)/10000 - 11644473600000), "yyyy-mm-dd HH:MM:ss") : "";
...
number 11644473600000
above is the difference of unix and MS filetime epoch (1970-01-01T00:00:00 - 1601-01-01T00:00:00) in milliseconds.
I think similar trick can be used for Mac and Windows Float type timestamps:
http://www.silisoftware.com/tools/date.php
I guess converter should not have hardcoded formats, but should have a setting to try arbitrary ksy descriptions against the fields of suitable size.
1) there are finite number of datestamps andf ortunately/unfortunately they are very commonly used in the binary formats. 2) I see no posibilty of conversion binary datastamp in language-agnost way using ksy-alone into human readable datetime format. While technically it is possible, it would be not efficient compared to usig builtin datestamp conversion language specific methods. Our planets revolution around the sun is quite messed... unless timestamp will get same love as zlib in ksy. That needs some research and erfort to work across different langueges consistently. Unfortunately languages treat datetime types with different precissions, which begs to ask if exact implementation should be not left for particular user case. In the end, whole tab is very helpful when RE. the clutter could be managed by options, pull down selection: eg. only one endiannes (le or be); only one timestamp format. when there would be some place for eventual ksy defined types too.
There might be some connection of this and https://github.com/kaitai-io/kaitai_struct/issues/188, but in general, there's little connection with ksy specs so far. Many timestamps are just integers that specify number of (seconds|microseconds|nanoseconds|etc) that passed since some arbitrary date.
I've posted a list of timestamps info I've collected, please take a look.
wow, that's an extensive list!
It might be a good idea to have a flexible type (custom epoch and resolution), like:
type: timestamp
timestamp-type: u4
unit: 1 for seconds, 10**-6 for nanoseconds, a float value
epoch: 1970 for unix epoch, an integer value
IMHO 1 we should not spoil syntax with everything possible. 2 every machine-readable date/time format can be converted into an integer ... 3 ... or parsed from an integer 4 instances and params are all what we need
so, 5 we don't need any additional syntax for timestamps, timestamp is just an uint offsetted to a standardized offset, I think we should stick to a single standard 6 and we don't even need additional types: we already have uints, I guess hints should be enough in for display purposes, but for another ones not covered we can implement the computation right in KS and put the type into the stdlib.
I thought of trying to add a "timestamp" / "datetime" data type, either abstract or concrete, and so far it looks like problems outweight potential benefits by a large margin.
For starters, quite a few languages lack definitive timestamp data types at all. Sometimes, there are several competing standards. Those who have such data types, are quite often restricted by a particular implementation:
1970-00-01
or 1970-01-00
or 1970-12-32
So, typically, at best we can do some approximation, yet lots of application require exact precision (for reproducibility, in-domain calculations, etc).
Probably this could be mediated by creation of something like "Kaitai DateTime", which would offer the same exact interface, built-in support for lots of possible time formats, and an ability to export into target language's timestamp formats, but this also has its cons and is obviously a huge undertaking.
I am more or less familiar with those issues. But that should not be a reason to not add it. After all, adding a type does not restrict the users but expands their capability. If they dont want to use the corresponding target type (due to multiple possible reasons) they can just parse Int32 and thats it. Having multiple possible implementations, its better to offer one than none. Languages that dont have any, like C, can just parse the int as well.
Construct has a flexible format, Timestamp works by parsing chosen subcon (Int32ul for example) then interprets it as a total seconds/microseconds/nanoseconds (chosen) since epoch (chosen). Of course the Arrow module might not support some values (subcon range outside of Arrow range), floating-point errors, etc. But it works with reasonable dates and reasonable precision.
timestamp-type: u4
is not a new type, it's new syntax
I expect using timestamps as
seq:
- id: ts_
type: s8
instances:
ts:
type: timestamp(ts_, ...other parameters needed)
note that we don't have typed instances for now
Construct has a flexible format, Timestamp works by parsing chosen subcon (Int32ul for example) then interprets it as a total seconds/microseconds/nanoseconds (chosen) since epoch (chosen). Of course the Arrow module might not support some values (subcon range outside of Arrow range), floating-point errors, etc.
I don't think that Construct's Timestamp is a good idea to copy for several reasons:
But it works with reasonable dates and reasonable precision.
The problem is that in binary formats world, people tend to use bizzare magical values and stuff like that. For example, "all 0s" and "all 1s" are popular choices that you quite often need to be able to represent exactly, as they have some special meaning like "time unknown", "this entity is void" or something like that. For these, you can't go with approximates.
Would it be possible to make this a -webide-representation
feature? I see the issue with putting it in the parser – better to have user code handle it – but most of the above formats could be quite easily displayed as ISO or localized strings. Having the tag in the ksy could also help document the raw type. Something like
{unix:timestamp=s/1970-01-01}
(or ms etc){msfiletime:timestamp=100ns/1601-01-01}
{webkit:timestamp=us/1601}
(iso8601 doesn't require -MM-DD
){cocoa:timestamp=s/2001}
{hfs:timestamp=s/1904}
{mysql:timestamp=s/0000}
{excel:timestamp=day/1899-12-31}
day
could be given as 86400s
, s
as 1s
or even 1us
as 0.000001
(s
) to simplify implementation.
I don't see the point in it for fields of separate components, which can already be formatted and viewed easily.
It turns out that there are many more timestamp formats in the wild besides UNIX timestamp. It would be cool to show more of them in "converter" to aid timestamp detection guesswork.
Timestamps
Continuous interval measurements
Unix timestamp
Microsoft FILETIME
WebKit/Chrome timestamp
Apple Cocoa Core Data timestamp
Apple Mac OS X HFS+ timestamp
Seconds since year 0
Excel timestamp
Complex structures
Microsoft FAT date time
Microsoft SYSTEMTIME
ISO9660 decimal timestamp
ISO9660 binary timestamp