kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.96k stars 192 forks source link

Add timestamp built-in type #793

Open Mingun opened 4 years ago

Mingun commented 4 years ago

Relatively many formats store time in some way; it would be useful to have a built-in type for the most common representations. Simple grepping by formats repository gives such files that propably uses UNIX timestamp, serialized as u4:

Generators should use appropriate target language type for timestamps

KOLANICH commented 4 years ago

Do we always need to transform timestamps into calendar reprs?

Mingun commented 4 years ago

I don't understand you question. Can you detail it?

KOLANICH commented 4 years ago

I don't understand what is wrong with numeric types for timestamps.

generalmimon commented 4 years ago

https://github.com/kaitai-io/kaitai_struct/issues/740#issuecomment-622343527 is related. As there are many different formats for storing timestamps, it would be really inconvenient to manage 20+ distinct methods only for converting different timestamp formats to native DateTimes in all runtime libraries we have (currently ~12 I think). Having something like schemas as proposed in https://github.com/kaitai-io/kaitai_struct/issues/188 would be much better, we could describe the timestamp formats in a language-agnostic way, and then having some lightweight DateTime adapter in every target language, which would actually just create the final object.

Mingun commented 4 years ago

I don't understand what is wrong with numeric types for timestamps.

The main problem is that it's just a number. You cannot visualize it as a date in visualizer (at least, automatically).

As there are many different formats for storing timestamps, it would be really inconvenient to manage 20+ distinct methods only for converting different timestamp formats to native DateTimes in all runtime libraries we have (currently ~12 I think).

I do not suggest processing all formats (deal with the numbers first), but the most popular can be built in. The most famous of them is unit timestamp, why not start with it.

KOLANICH commented 4 years ago

No problem with numbers. Again, we don't want calendar times by default. Computing calendar times has large overhead. It is not trivial. It depends on human history a lot.

Mingun commented 4 years ago

It is not trivial

For unix timestamp I don't see any overheads. It just a representation and many languages already have functions for creating time from it and internally stores time in the same manner.

Finally, no one to force the code to be generated as

// generated field accessor
language_specific_datetime_library_type field() { ... }

If you super concerned about perfomance or raw value access, just use wrapper, but visualizers will knows, that this is datetime and can handle it:

// generated field accessor
kaitai_struct_runtime_date_time field() { ... }

// in runtime library
class kaitai_struct_runtime_date_time {
  uint32_t raw;
public:
  language_specific_datetime_library_type to_language_specific_datetime_library_type() { ... }
}
KOLANICH commented 4 years ago

If you super concerned about perfomance or raw value access, just use wrapper

A wrapper would require templates mechanism, because there may be different numeric types storing timestamps. Something like timestamp<base=unix, u4, second>. Or maybe just an additional type to be used in instances as

pos: 0
type: timestamp<unix, second>(your_property_with_timestamp)

but it won't give you your goal of having timestamp field in seq giving calendar representation.

Mingun commented 4 years ago

I don't expect to use templates in KSY at least not now. I suggest start with only this: KSY:

seq:
  - id: time
    type: timestamp # actually, unix timestamp. Maybe also timestamp[4/8][be/le]

Generated code as suggested above.

KOLANICH commented 4 years ago

This approach sucks. The reasons have been given in this issue.

Mingun commented 4 years ago

You can say this for almost any type. Numbers is first.

webbnh commented 4 years ago

type: timestamp # actually, unix timestamp. Maybe also timestamp[4/8][be/le]

Imagine that this is a successful, valued feature...now how do we extend it to support other timestamp formats? There's a DOS format which probably sees more use than the UNIX one. There are alternate UNIX formats (counting milliseconds, microseconds, or ten-nanosecond ticks, instead of seconds). The UNIX format is only good until 2038, at which point it runs out of bits, so there's a 64-bit version of it. And so on.

Your effort to "get something" is a point solution, and it not obviously extensible. (Adding timestamp8 and timestamp4be and timestamp8le and timestampdos is rather ugly.)

Since the principal thrust of your proposal here seems to be to allow the data to be visualized for human consumption, perhaps a better solution would be to offer some sort of support for custom formatting options. This could potentially be something of general use going well beyond timestamps.

Mingun commented 4 years ago

Imagine that this is a successful, valued feature...now how do we extend it to support other timestamp formats?

In the same way, as different encodings of strings supported, and special case for C-strings and so on. Use another attribute, which details format further. For example:

seq:
  - id: time
    type: timestamp
    format: unix # dos/windows/iso/whatever

Regarding

Adding timestamp8 and timestamp4be and timestamp8le and timestampdos is rather ugly.

I already suggested to use ordinal endian key in attributes in another issue, but this did not meet much etusiasm. Size could be handled in similar way.

KOLANICH commented 4 years ago
type: timestamp
format: unix # dos/windows/iso/whatever

is complete shit. It requires context-dependent syntax extension. A bit too heavy for such a type.

Mingun commented 4 years ago

It requires context-dependent syntax extension.

It is bad? What difference from

type: str
encoding: utf-8

or

type: whatever
process: zip

?

dgelessus commented 4 years ago

Would it make sense to have a new feature similar to process, except that instead of doing bytes <=> bytes processing, it would convert already parsed data to/from another high-level type? I think something like this has been suggested before (as post-process): https://github.com/kaitai-io/kaitai_struct/issues/668#issuecomment-573398987

Then you could parse timestamps normally as integers of the appropriate size/endianness and afterwards let the parsed integer be converted to a timestamp:

- id: mtime
  type: u8
  post-process: unix_timestamp

If this mechanism is made extensible like process, we need to worry less about supporting all imaginable timestamp formats, because then you can write a custom post-process function when the timestamp format you need isn't natively supported. (Or alternatively put the timestamp post-process implementations in a separate repo right away to decouple them from the main KS release cycle.)

This would also be compatible with serialization, because post-process would (similar to process) support conversions in both directions, so the timestamp post-process functions would support converting back and forth between integer timestamps and high-level datetime values.

KOLANICH commented 4 years ago

It is bad?

It is.

What difference from str

str is blessed. It is blessed in the sense it was introduced early when noone has thought about templates and interfaces and serialization. It should be redesigned as a template somewhen. And we would need a mechanism for global parameters for templates.

What difference from process

process processes raw bytes into raw bytes. Always. By its definition.

post-process

Don't we have instances?

dgelessus commented 4 years ago

post-process

Don't we have instances?

A dedicated mechanism for this would have a few advantages - see the discussion in the other issue: https://github.com/kaitai-io/kaitai_struct/issues/668#issuecomment-573364325