libyal / libevtx

Library and tools to access the Windows XML Event Log (EVTX) format
GNU Lesser General Public License v3.0
188 stars 49 forks source link

Please add data types to the information tables #28

Closed yhojann-cl closed 5 months ago

yhojann-cl commented 3 years ago

In the file documentation/Windows XML Event Log (EVTX).asciidoc in the data information tables need the data types, by example if the value is plain bytes, dword, int32, int64, etc. By example, the record identifier is a int64 or dword?

joachimmetz commented 3 years ago

@WHK102 why "do you need the data types" ?

yhojann-cl commented 3 years ago

Because completely ignorant of the data structure and its types, it becomes much more complicated to find out in the computer what type of data each thing is, if I should treat a value as numeric, as text or as bytes. Over time each type is understood, but it would save a lot of time if you indicated in advance what type of data each value should be. For example, until a few days ago I did not know that a crc32 checksum was a numerical value, I had never done a crc32 sum and I could not do the validation directly with bytes, I understand that it is my ignorance for not having found out more in the official documentation of python, but in other cases like the identifier of a record I don't know whether to treat it as bytes or as a numeric value and it takes time to figure out. The same happens with numerical values of 16, 32 or 64 bits, some values contain 2 or 4 bytes and although in python the task is simple, in other languages such as c ++ you have to deduce the types of data to process along the way.

If the data types were defined it would be much easier to do a development from scratch using this documentation.

I really appreciate the effort they have put into documenting everything and it has helped me a lot, but I feel that the types are needed in each value, especially because when you get to the structure of the XML binary, everything gets complicated.

I have made a mistake in the development thinking that the integer values needed 4 additional bytes each one because I have used the unpacking of values in an incorrect way since a signed integer needs 4 extra bytes, fortunately you yourself helped me with other issues regarding this inconvenient and you helped me to know that the values without no signs and that makes the amount of bytes used correct. It would have saved me from doing the issues and doing all these inquiries if the data types were simply already defined in the documentation.

joachimmetz commented 3 years ago

Ack, I'll give it some thought on how to best put this in the documentation. For now assume values are considered unsigned unless explicitly noted.

Unfortunately it can take a lot of time to confirm / deny if a 4 byte value in a proprietary format is signed or unsigned. Also sometimes this completely arbitrarily used by implementations. The documentation is mainly intended for maintaining the project itself, less as an EVTX from scratch guide.

Building data format parsers / analyzers is a topics that requires a fair bit of domain knowledge.

joachimmetz commented 5 months ago

Closing as infeasible, the documentation is not intended as introduction guide to the format.