Open Mingun opened 7 months ago
I think, the best solution is to move read just after creating object, but then this can create a memory leak in C++98 target if _read()
of an array item will throw. Currently such object will be deleted in _clean_up()
method:
// This is a generated file! Please edit source .ksy file and use kaitai-struct-compiler to rebuild
#include "debug_array_user.h"
debug_array_user_t::debug_array_user_t(kaitai::kstream* p__io, kaitai::kstruct* p__parent, debug_array_user_t* p__root) : kaitai::kstruct(p__io) {
m__parent = p__parent;
m__root = p__root ? p__root : this;
m_one_cat = 0;
m_array_of_cats = 0;
}
void debug_array_user_t::_read() {
m_one_cat = new cat_t(m__io, this, m__root);
m_one_cat->_read();
m_array_of_cats = new std::vector<cat_t*>();
const int l_array_of_cats = 3;
for (int i = 0; i < l_array_of_cats; i++) {
cat_t* _t_array_of_cats = new cat_t(m__io, this, m__root);
m_array_of_cats->push_back(_t_array_of_cats);
_t_array_of_cats->_read();
}
}
debug_array_user_t::~debug_array_user_t() {
_clean_up();
}
void debug_array_user_t::_clean_up() {
if (m_one_cat) {
delete m_one_cat; m_one_cat = 0;
}
if (m_array_of_cats) {
for (std::vector<cat_t*>::iterator it = m_array_of_cats->begin(); it != m_array_of_cats->end(); ++it) {
delete *it;
}
delete m_array_of_cats; m_array_of_cats = 0;
}
}
debug_array_user_t::cat_t::cat_t(kaitai::kstream* p__io, debug_array_user_t* p__parent, debug_array_user_t* p__root) : kaitai::kstruct(p__io) {
m__parent = p__parent;
m__root = p__root;
}
void debug_array_user_t::cat_t::_read() {
m_meow = m__io->read_u1();
}
debug_array_user_t::cat_t::~cat_t() {
_clean_up();
}
void debug_array_user_t::cat_t::_clean_up() {
}
Because in C++98 target we do not want to use any smart pointers because they can absence, we can, however, create a custom, very limited smart pointer type, which will only exist to delete memory allocated for object if its _read()
method throw. Then the debug_array_user_t::_read()
would be implemented as:
void debug_array_user_t::_read() {
m_one_cat = new cat_t(m__io, this, m__root);
m_one_cat->_read();
m_array_of_cats = new std::vector<cat_t*>();
const int l_array_of_cats = 3;
for (int i = 0; i < l_array_of_cats; i++) {
safe_read_helper<cat_t> _t_array_of_cats_helper = safe_read_helper<cat_t>(new cat_t(m__io, this, m__root));
cat_t* _t_array_of_cats = _t_array_of_cats_helper.read();
m_array_of_cats->push_back(_t_array_of_cats);
}
}
The helper is very simple and limited:
/// Will work only with concrete types
/// Live only on stack. Destroyed just after calling `read()`
template<class T>
class safe_read_helper {
T* m_ptr;
public:
safe_read_helper(T* ptr) : m_ptr(ptr) {}
~safe_read_helper() { delete this->m_ptr; }
T* read() {
T* ptr = this->m_ptr;
ptr->_read();
this->m_ptr = 0;
return ptr;
}
}
@Mingun:
This is the common problem: in debug mode reading is performed after placing new object into container, but in no debug mode it is readed in constructor, i. e. before placing into container. That is very unpleasant situation, because debug mode ideally should not change behavior.
The "change in behavior" is a valid point, though the difference is subtle and it most likely won't matter to the vast majority of .ksy specs: if some expression in the .ksy uses items.size
(or items.last
) before items
is fully parsed, it would include the new to-be-parsed element in --no-auto-read
mode, but not in the default autoRead
mode. If some spec relies on this, it should really use _index
, though.
Otherwise, if the .ksy spec doesn't use items.size
during parsing items
, I don't think there is any observable change. In the case of successful parsing (when no error is thrown), the object trees will be the same regardless of whether autoRead
is true
or false
. In case of an unsuccessful parsing, the default autoRead
mode won't give any object tree (it will deliver the exception instead) and --no-auto-read
will give some (partial) tree. This comes from the very nature of autoRead
, there's no "unpleasant" change in behavior.
I think, the best solution is to move read just after creating object, but then this can create a memory leak in C++98 target if
_read()
of an array item will throw.
It used to be this way in 0.10, but it was intentionally changed in https://github.com/kaitai-io/kaitai_struct_compiler/commit/755bcbe1ba958469d4c40d56838e98fb4c6c121b. It makes the new Web IDE feature introduced in https://github.com/kaitai-io/kaitai_struct_webide/pull/170 much more useful, because it allows displaying the last partially parsed object of arrays. Without this compiler change, chances are that the Web IDE with complex formats could only display objects that are parsed completely fine before reaching the object whose parsing failed (arbitrarily deep in this object), which will probably not help in diagnosing the problem - you want to get as close as possible to the error location.
Unfortunately, it indeed breaks the DebugArrayUser
test for C++11. At worst, we can limit the scope of the change in https://github.com/kaitai-io/kaitai_struct_compiler/commit/755bcbe1ba958469d4c40d56838e98fb4c6c121b to JavaScript only. But as you note, it creates a memory leak in C++98 if _read()
fails (which is actually the same situation where we're most interested in this partial object in JavaScript).
To me, the obvious fix for the invalid memory access problem in C++11 would be to accept that the object has been moved into the std::vector
and access it via the vector, like this (untested):
for (int i = 0; i < l_array_of_cats; i++) {
std::unique_ptr<cat_t> _t_array_of_cats = std::unique_ptr<cat_t>(new cat_t(m__io, this, m__root));
m_array_of_cats->push_back(std::move(_t_array_of_cats));
m_array_of_cats->back()->_read(); // instead of _t_array_of_cats->_read();
}
Or if we came to a conclusion that the "change in behavior" is a problem, we could do something along these lines (m_array_of_cats->push_back(...);
apparently must be duplicated, because it's not my fault that C++ doesn't support try..finally
):
for (int i = 0; i < l_array_of_cats; i++) {
std::unique_ptr<cat_t> _t_array_of_cats = std::unique_ptr<cat_t>(new cat_t(m__io, this, m__root));
try {
_t_array_of_cats->_read();
} catch(...) {
m_array_of_cats->push_back(std::move(_t_array_of_cats));
throw;
}
m_array_of_cats->push_back(std::move(_t_array_of_cats));
}
And I would be happy with this even in JavaScript, where I need to ensure that the partial objects are accessible because of the Web IDE. Perhaps this is indeed a better solution.
I totally agree that partial parsing is very useful feature and it shouldn't be removed in order to fix bug. Any solution that keeps it would be better. Another way is to use RTTI helper in C++11 code that will push_back
into array it its destructor.
Currently C++ generator when uses smart pointers generates incorrect code that uses
unique_ptr
after it was moved into the vector:That is the reason why
DebugArrayUser
test on https://ci.kaitai.io/ is segfaulted on C++11 with messagememory access violation at address: 0x8: no mapping at fault address
.This is the common problem: in debug mode reading is performed after placing new object into container, but in no debug mode it is readed in constructor, i. e. before placing into container. That is very unpleasant situation, because debug mode ideally should not change behavior.