Open Mingun opened 4 years ago
Do we always need to transform timestamps into calendar reprs?
I don't understand you question. Can you detail it?
I don't understand what is wrong with numeric types for timestamps.
https://github.com/kaitai-io/kaitai_struct/issues/740#issuecomment-622343527 is related. As there are many different formats for storing timestamps, it would be really inconvenient to manage 20+ distinct methods only for converting different timestamp formats to native DateTime
s in all runtime libraries we have (currently ~12 I think). Having something like schemas as proposed in https://github.com/kaitai-io/kaitai_struct/issues/188 would be much better, we could describe the timestamp formats in a language-agnostic way, and then having some lightweight DateTime
adapter in every target language, which would actually just create the final object.
I don't understand what is wrong with numeric types for timestamps.
The main problem is that it's just a number. You cannot visualize it as a date in visualizer (at least, automatically).
As there are many different formats for storing timestamps, it would be really inconvenient to manage 20+ distinct methods only for converting different timestamp formats to native
DateTime
s in all runtime libraries we have (currently ~12 I think).
I do not suggest processing all formats (deal with the numbers first), but the most popular can be built in. The most famous of them is unit timestamp, why not start with it.
No problem with numbers. Again, we don't want calendar times by default. Computing calendar times has large overhead. It is not trivial. It depends on human history a lot.
It is not trivial
For unix timestamp I don't see any overheads. It just a representation and many languages already have functions for creating time from it and internally stores time in the same manner.
Finally, no one to force the code to be generated as
// generated field accessor
language_specific_datetime_library_type field() { ... }
If you super concerned about perfomance or raw value access, just use wrapper, but visualizers will knows, that this is datetime and can handle it:
// generated field accessor
kaitai_struct_runtime_date_time field() { ... }
// in runtime library
class kaitai_struct_runtime_date_time {
uint32_t raw;
public:
language_specific_datetime_library_type to_language_specific_datetime_library_type() { ... }
}
If you super concerned about perfomance or raw value access, just use wrapper
A wrapper would require templates mechanism, because there may be different numeric types storing timestamps. Something like timestamp<base=unix, u4, second>
. Or maybe just an additional type to be used in instances as
pos: 0
type: timestamp<unix, second>(your_property_with_timestamp)
but it won't give you your goal of having timestamp field in seq
giving calendar representation.
I don't expect to use templates in KSY at least not now. I suggest start with only this: KSY:
seq:
- id: time
type: timestamp # actually, unix timestamp. Maybe also timestamp[4/8][be/le]
Generated code as suggested above.
This approach sucks. The reasons have been given in this issue.
You can say this for almost any type. Numbers is first.
type: timestamp # actually, unix timestamp. Maybe also timestamp[4/8][be/le]
Imagine that this is a successful, valued feature...now how do we extend it to support other timestamp formats? There's a DOS format which probably sees more use than the UNIX one. There are alternate UNIX formats (counting milliseconds, microseconds, or ten-nanosecond ticks, instead of seconds). The UNIX format is only good until 2038, at which point it runs out of bits, so there's a 64-bit version of it. And so on.
Your effort to "get something" is a point solution, and it not obviously extensible. (Adding timestamp8
and timestamp4be
and timestamp8le
and timestampdos
is rather ugly.)
Since the principal thrust of your proposal here seems to be to allow the data to be visualized for human consumption, perhaps a better solution would be to offer some sort of support for custom formatting options. This could potentially be something of general use going well beyond timestamps.
Imagine that this is a successful, valued feature...now how do we extend it to support other timestamp formats?
In the same way, as different encodings of strings supported, and special case for C-strings and so on. Use another attribute, which details format further. For example:
seq:
- id: time
type: timestamp
format: unix # dos/windows/iso/whatever
Regarding
Adding timestamp8 and timestamp4be and timestamp8le and timestampdos is rather ugly.
I already suggested to use ordinal endian
key in attributes in another issue, but this did not meet much etusiasm. Size could be handled in similar way.
type: timestamp
format: unix # dos/windows/iso/whatever
is complete shit. It requires context-dependent syntax extension. A bit too heavy for such a type.
It requires context-dependent syntax extension.
It is bad? What difference from
type: str
encoding: utf-8
or
type: whatever
process: zip
?
Would it make sense to have a new feature similar to process
, except that instead of doing bytes <=> bytes processing, it would convert already parsed data to/from another high-level type? I think something like this has been suggested before (as post-process
): https://github.com/kaitai-io/kaitai_struct/issues/668#issuecomment-573398987
Then you could parse timestamps normally as integers of the appropriate size/endianness and afterwards let the parsed integer be converted to a timestamp:
- id: mtime
type: u8
post-process: unix_timestamp
If this mechanism is made extensible like process
, we need to worry less about supporting all imaginable timestamp formats, because then you can write a custom post-process
function when the timestamp format you need isn't natively supported. (Or alternatively put the timestamp post-process
implementations in a separate repo right away to decouple them from the main KS release cycle.)
This would also be compatible with serialization, because post-process
would (similar to process
) support conversions in both directions, so the timestamp post-process
functions would support converting back and forth between integer timestamps and high-level datetime values.
It is bad?
It is.
What difference from
str
str
is blessed. It is blessed in the sense it was introduced early when noone has thought about templates and interfaces and serialization. It should be redesigned as a template somewhen. And we would need a mechanism for global parameters for templates.
What difference from
process
process
processes raw bytes into raw bytes. Always. By its definition.
post-process
Don't we have instance
s?
post-process
Don't we have
instance
s?
A dedicated mechanism for this would have a few advantages - see the discussion in the other issue: https://github.com/kaitai-io/kaitai_struct/issues/668#issuecomment-573364325
Relatively many formats store time in some way; it would be useful to have a built-in type for the most common representations. Simple grepping by
formats
repository gives such files that propably uses UNIX timestamp, serialized asu4
:Generators should use appropriate target language type for timestamps