Closed milahu closed 1 year ago
@milahu:
problem:
value: ser.as<serial>
does not produce a typecastself._m_serial_type = self.ser
This is correct, it's working as expected. Python is dynamically typed, so if you believe that you have an object of particular type in a variable, you can immediately access the properties specific to that type. This is in contrary to statically typed languages like Java where if you know that a variable of the general KaitaiStruct
type currently holds an object of more specific type on which you would like to access a property, you must first do the type conversion (see https://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html), otherwise the Java compiler will give you a compile error.
expected: something like
_pos = self.ser._io.pos() ser_len = 1 # TODO dynamic self.ser._io.seek(_pos - ser_len) self._m_serial_type = Sqlite3.Serial(self.ser._io, self.ser, self.ser._root) #self.ser._io.seek(_pos)
This is a misunderstanding of the type cast operation - a type cast should never do anything like this. You think that:
the typecast seems to be working in java:
this.serialType = ((Sqlite3.Serial) (ser()));
and in a way, yes, the generated code also looks as I'd expect (as in Python), but if you actually run the Java code, you'll get a ClassCastException
for the same reason you got the AttributeError
in Python (just the error is thrown on the type cast already, not on the attribute access) - the code expected that the real type of ser
is Serial
, but it is actually VlqBase128Be
and thus the type conversion failed.
The real issue is that sqlite3.ksy is wrong and needs to be fixed (thanks for discovering and reporting this).
For starters, this mistake is suppressed by the fact that the ser
parameter is declared as type: struct
, see sqlite3.ksy:207-210
:
column_content:
params:
- id: ser
type: struct
struct
means any user-defined type. This is a problem, because the compiler (correctly) allows passing any user type in there - in this case the type will be vlq_base128_be
(sqlite3.ksy:183-189
):
- id: column_serials
size: len_header_and_len.value - 1
type: serials
- id: column_contents
repeat: expr
repeat-expr: column_serials.entries.size
type: column_content(column_serials.entries[_index])
serials:
seq:
- id: entries
type: vlq_base128_be
repeat: eos
But although the actual type of ser
is always vlq_base128_be
, as we've just seen, the spec thinks it's serial
, which is not (sqlite3.ksy:234-236
):
instances:
serial_type:
value: ser.as<serial>
So this is basically guaranteed to fail at runtime. But there's not much Kaitai Struct compiler can do about this - strictly speaking, all operations here are valid and are correctly translated (it's just that the .ksy spec is badly written).
It's much better to require a specific user type when defining the parameter:
column_content:
params:
- id: ser
- type: struct
+ type: serial
Now, KS compiler will not allow passing the vlq_base128_be
type to the ser
parameter and will raise a compile error.
But unfortunately, if you declare vlq_base128_be
as the parameter type, you'll be allowed to pass column_serials.entries[_index]
there (as expected), but I don't think you'll get a compile error because of the ser.as<serial>
operation (which can never succeed and the compiler could automatically detect it too and throw a compile error, but sadly that is not implemented):
column_content:
params:
- id: ser
- type: struct
+ type: vlq_base128_be
KS compiler is very dumb when it comes to type casting - AFAIK it allows absolutely any type cast you write, and doesn't check whether it makes any sense (this is tracked in https://github.com/kaitai-io/kaitai_struct/issues/696). So using type casting may be dangerous if you don't know what you're doing. I recommend using it sparingly and really think about whether it is valid (in many cases, people use it in a way they shouldn't and it causes problems).
It's much better to require a specific user type when defining the parameter:
column_content: params: - id: ser - type: struct + type: serial
Now, KS compiler will not allow passing the
vlq_base128_be
type to theser
parameter and will raise a compile error.
yes, i also had to patch serials
to make this work
serials:
seq:
- id: entries
- type: vlq_base128_be
+ type: serial
The real issue is that sqlite3.ksy is wrong
this looks like a micro-optimization, trying to defer the evaluation of serial
... but i would need the original IO position of self.ser to read the orignal bytes
implemented in https://github.com/milahu/pysqlite3/tree/fix-typecast-with-io-init-pos
closing in favor of https://github.com/kaitai-io/kaitai_struct_formats/pull/640
@milahu:
... but i would need the original IO position of self.ser to read the orignal bytes
implemented in https://github.com/milahu/pysqlite3/tree/fix-typecast-with-io-init-pos
Again, this is not a type cast, this is reparsing the bytes of originally one structure as another structure (but it was never needed, the actual problem was how the sqlite3.ksy was written). I tried to explain it in my last comment (https://github.com/kaitai-io/kaitai_struct/issues/1017#issuecomment-1492988496), I recommend reading it, I wrote it for you.
yes, i also had to patch
serials
to make this workserials: seq: - id: entries - type: vlq_base128_be + type: serial
▸ diff sqlite3.ksy
This patch looks quite legit, so why don't you use kaitai-struct-compiler
to regenerate the generated sqlite3.py
? Then there will be no type cast and no serial_type
, so your patch in https://github.com/milahu/pysqlite3/commit/d3aa20c664cc6bd39eb60c92f744e3ca6d8369f9 will also be meaningless. I don't understand how the second half of your comment https://github.com/kaitai-io/kaitai_struct/issues/1017#issuecomment-1493000218 can follow after the first one.
im trying to use sqlite3.ksy in python
this breaks when i access
problem:
value: ser.as<serial>
does not produce a typecastexpected: something like
... but i would need the original IO position of self.ser to read the orignal bytes or i would need a to_bytes method:
Sqlite3.Serial.from_bytes(self.ser.to_bytes())
self.ser
is the raw value, for exampleself.ser = 23
self.serial_type
is the interpreted value, for exampleself.serial_type.is_blob = True
sqlite3.kty
sqlite3.py
the typecast seems to be working in java:
this.serialType = ((Sqlite3.Serial) (ser()));
Sqlite3.java
im using kaitai-struct-compiler version 0.10 to generate sqlite3.py
sqlite3.py
```py # This is a generated file! Please edit source .ksy file and use kaitai-struct-compiler to rebuild import kaitaistruct from kaitaistruct import KaitaiStruct, KaitaiStream, BytesIO from enum import Enum if getattr(kaitaistruct, "API_VERSION", (0, 9)) < (0, 9): raise Exception( "Incompatible Kaitai Struct Python API: 0.9 or later is required, but you have %s" % (kaitaistruct.__version__) ) from . import vlq_base128_be class Sqlite3(KaitaiStruct): """SQLite3 is a popular serverless SQL engine, implemented as a library to be used within other applications. It keeps its databases as regular disk files. Every database file is segmented into pages. First page (starting at the very beginning) is special: it contains a file-global header which specifies some data relevant to proper parsing (i.e. format versions, size of page, etc). After the header, normal contents of the first page follow. Each page would be of some type, and generally, they would be reached via the links starting from the first page. First page type (`root_page`) is always "btree_page". .. seealso:: Source - https://www.sqlite.org/fileformat.html """ class Versions(Enum): legacy = 1 wal = 2 class Encodings(Enum): utf_8 = 1 utf_16le = 2 utf_16be = 3 def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.magic = self._io.read_bytes(16) if ( not self.magic == b"\x53\x51\x4C\x69\x74\x65\x20\x66\x6F\x72\x6D\x61\x74\x20\x33\x00" ): raise kaitaistruct.ValidationNotEqualError( b"\x53\x51\x4C\x69\x74\x65\x20\x66\x6F\x72\x6D\x61\x74\x20\x33\x00", self.magic, self._io, "/seq/0", ) self.len_page_mod = self._io.read_u2be() self.write_version = KaitaiStream.resolve_enum( Sqlite3.Versions, self._io.read_u1() ) self.read_version = KaitaiStream.resolve_enum( Sqlite3.Versions, self._io.read_u1() ) self.reserved_space = self._io.read_u1() self.max_payload_frac = self._io.read_u1() self.min_payload_frac = self._io.read_u1() self.leaf_payload_frac = self._io.read_u1() self.file_change_counter = self._io.read_u4be() self.num_pages = self._io.read_u4be() self.first_freelist_trunk_page = self._io.read_u4be() self.num_freelist_pages = self._io.read_u4be() self.schema_cookie = self._io.read_u4be() self.schema_format = self._io.read_u4be() self.def_page_cache_size = self._io.read_u4be() self.largest_root_page = self._io.read_u4be() self.text_encoding = KaitaiStream.resolve_enum( Sqlite3.Encodings, self._io.read_u4be() ) self.user_version = self._io.read_u4be() self.is_incremental_vacuum = self._io.read_u4be() self.application_id = self._io.read_u4be() self.reserved = self._io.read_bytes(20) self.version_valid_for = self._io.read_u4be() self.sqlite_version_number = self._io.read_u4be() self.root_page = Sqlite3.BtreePage(self._io, self, self._root) class Serial(KaitaiStruct): def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.code = vlq_base128_be.VlqBase128Be(self._io) @property def is_blob(self): if hasattr(self, "_m_is_blob"): return self._m_is_blob self._m_is_blob = (self.code.value >= 12) and ((self.code.value % 2) == 0) return getattr(self, "_m_is_blob", None) @property def is_string(self): if hasattr(self, "_m_is_string"): return self._m_is_string self._m_is_string = (self.code.value >= 13) and ((self.code.value % 2) == 1) return getattr(self, "_m_is_string", None) @property def len_content(self): if hasattr(self, "_m_len_content"): return self._m_len_content if self.code.value >= 12: self._m_len_content = (self.code.value - 12) // 2 return getattr(self, "_m_len_content", None) class BtreePage(KaitaiStruct): def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.page_type = self._io.read_u1() self.first_freeblock = self._io.read_u2be() self.num_cells = self._io.read_u2be() self.ofs_cells = self._io.read_u2be() self.num_frag_free_bytes = self._io.read_u1() if (self.page_type == 2) or (self.page_type == 5): self.right_ptr = self._io.read_u4be() self.cells = [] for i in range(self.num_cells): self.cells.append(Sqlite3.RefCell(self._io, self, self._root)) class CellIndexLeaf(KaitaiStruct): """ .. seealso:: Source - https://www.sqlite.org/fileformat.html#b_tree_pages """ def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.len_payload = vlq_base128_be.VlqBase128Be(self._io) self._raw_payload = self._io.read_bytes(self.len_payload.value) _io__raw_payload = KaitaiStream(BytesIO(self._raw_payload)) self.payload = Sqlite3.CellPayload(_io__raw_payload, self, self._root) class Serials(KaitaiStruct): def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.entries = [] i = 0 while not self._io.is_eof(): self.entries.append(vlq_base128_be.VlqBase128Be(self._io)) i += 1 class CellTableLeaf(KaitaiStruct): """ .. seealso:: Source - https://www.sqlite.org/fileformat.html#b_tree_pages """ def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.len_payload = vlq_base128_be.VlqBase128Be(self._io) self.row_id = vlq_base128_be.VlqBase128Be(self._io) self._raw_payload = self._io.read_bytes(self.len_payload.value) _io__raw_payload = KaitaiStream(BytesIO(self._raw_payload)) self.payload = Sqlite3.CellPayload(_io__raw_payload, self, self._root) class CellPayload(KaitaiStruct): """ .. seealso:: Source - https://sqlite.org/fileformat2.html#record_format """ def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.len_header_and_len = vlq_base128_be.VlqBase128Be(self._io) self._raw_column_serials = self._io.read_bytes( (self.len_header_and_len.value - 1) ) _io__raw_column_serials = KaitaiStream(BytesIO(self._raw_column_serials)) self.column_serials = Sqlite3.Serials( _io__raw_column_serials, self, self._root ) self.column_contents = [] for i in range(len(self.column_serials.entries)): self.column_contents.append( Sqlite3.ColumnContent( self.column_serials.entries[i], self._io, self, self._root ) ) class CellTableInterior(KaitaiStruct): """ .. seealso:: Source - https://www.sqlite.org/fileformat.html#b_tree_pages """ def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.left_child_page = self._io.read_u4be() self.row_id = vlq_base128_be.VlqBase128Be(self._io) class CellIndexInterior(KaitaiStruct): """ .. seealso:: Source - https://www.sqlite.org/fileformat.html#b_tree_pages """ def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.left_child_page = self._io.read_u4be() self.len_payload = vlq_base128_be.VlqBase128Be(self._io) self._raw_payload = self._io.read_bytes(self.len_payload.value) _io__raw_payload = KaitaiStream(BytesIO(self._raw_payload)) self.payload = Sqlite3.CellPayload(_io__raw_payload, self, self._root) class ColumnContent(KaitaiStruct): def __init__(self, ser, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self.ser = ser self._read() def _read(self): if (self.serial_type.code.value >= 1) and ( self.serial_type.code.value <= 6 ): _on = self.serial_type.code.value if _on == 4: self.as_int = self._io.read_u4be() elif _on == 6: self.as_int = self._io.read_u8be() elif _on == 1: self.as_int = self._io.read_u1() elif _on == 3: self.as_int = self._io.read_bits_int_be(24) elif _on == 5: self.as_int = self._io.read_bits_int_be(48) elif _on == 2: self.as_int = self._io.read_u2be() if self.serial_type.code.value == 7: self.as_float = self._io.read_f8be() if self.serial_type.is_blob: self.as_blob = self._io.read_bytes(self.serial_type.len_content) self.as_str = (self._io.read_bytes(self.serial_type.len_content)).decode( "UTF-8" ) @property def serial_type(self): if hasattr(self, "_m_serial_type"): return self._m_serial_type self._m_serial_type = self.ser return getattr(self, "_m_serial_type", None) class RefCell(KaitaiStruct): def __init__(self, _io, _parent=None, _root=None): self._io = _io self._parent = _parent self._root = _root if _root else self self._read() def _read(self): self.ofs_body = self._io.read_u2be() @property def body(self): if hasattr(self, "_m_body"): return self._m_body _pos = self._io.pos() self._io.seek(self.ofs_body) _on = self._parent.page_type if _on == 13: self._m_body = Sqlite3.CellTableLeaf(self._io, self, self._root) elif _on == 5: self._m_body = Sqlite3.CellTableInterior(self._io, self, self._root) elif _on == 10: self._m_body = Sqlite3.CellIndexLeaf(self._io, self, self._root) elif _on == 2: self._m_body = Sqlite3.CellIndexInterior(self._io, self, self._root) self._io.seek(_pos) return getattr(self, "_m_body", None) @property def len_page(self): if hasattr(self, "_m_len_page"): return self._m_len_page self._m_len_page = 65536 if self.len_page_mod == 1 else self.len_page_mod return getattr(self, "_m_len_page", None) ```