kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.02k stars 197 forks source link

C++11: segfault in debug mode for arrays #1105

Open Mingun opened 7 months ago

Mingun commented 7 months ago

Currently C++ generator when uses smart pointers generates incorrect code that uses unique_ptr after it was moved into the vector:

void debug_array_user_t::_read() {
    m_one_cat = std::unique_ptr<cat_t>(new cat_t(m__io, this, m__root));
    m_one_cat->_read();
    m_array_of_cats = std::unique_ptr<std::vector<std::unique_ptr<cat_t>>>(new std::vector<std::unique_ptr<cat_t>>());
    const int l_array_of_cats = 3;
    for (int i = 0; i < l_array_of_cats; i++) {
        std::unique_ptr<cat_t> _t_array_of_cats = std::unique_ptr<cat_t>(new cat_t(m__io, this, m__root));
        m_array_of_cats->push_back(std::move(_t_array_of_cats));
        _t_array_of_cats->_read(); // <<<<<<< BIG PROBLEM ^^^ object is moved out
    }
}

That is the reason why DebugArrayUser test on https://ci.kaitai.io/ is segfaulted on C++11 with message memory access violation at address: 0x8: no mapping at fault address.

This is the common problem: in debug mode reading is performed after placing new object into container, but in no debug mode it is readed in constructor, i. e. before placing into container. That is very unpleasant situation, because debug mode ideally should not change behavior.

Mingun commented 7 months ago

I think, the best solution is to move read just after creating object, but then this can create a memory leak in C++98 target if _read() of an array item will throw. Currently such object will be deleted in _clean_up() method:

// This is a generated file! Please edit source .ksy file and use kaitai-struct-compiler to rebuild

#include "debug_array_user.h"

debug_array_user_t::debug_array_user_t(kaitai::kstream* p__io, kaitai::kstruct* p__parent, debug_array_user_t* p__root) : kaitai::kstruct(p__io) {
    m__parent = p__parent;
    m__root = p__root ? p__root : this;
    m_one_cat = 0;
    m_array_of_cats = 0;
}

void debug_array_user_t::_read() {
    m_one_cat = new cat_t(m__io, this, m__root);
    m_one_cat->_read();
    m_array_of_cats = new std::vector<cat_t*>();
    const int l_array_of_cats = 3;
    for (int i = 0; i < l_array_of_cats; i++) {
        cat_t* _t_array_of_cats = new cat_t(m__io, this, m__root);
        m_array_of_cats->push_back(_t_array_of_cats);
        _t_array_of_cats->_read();
    }
}

debug_array_user_t::~debug_array_user_t() {
    _clean_up();
}

void debug_array_user_t::_clean_up() {
    if (m_one_cat) {
        delete m_one_cat; m_one_cat = 0;
    }
    if (m_array_of_cats) {
        for (std::vector<cat_t*>::iterator it = m_array_of_cats->begin(); it != m_array_of_cats->end(); ++it) {
            delete *it;
        }
        delete m_array_of_cats; m_array_of_cats = 0;
    }
}

debug_array_user_t::cat_t::cat_t(kaitai::kstream* p__io, debug_array_user_t* p__parent, debug_array_user_t* p__root) : kaitai::kstruct(p__io) {
    m__parent = p__parent;
    m__root = p__root;
}

void debug_array_user_t::cat_t::_read() {
    m_meow = m__io->read_u1();
}

debug_array_user_t::cat_t::~cat_t() {
    _clean_up();
}

void debug_array_user_t::cat_t::_clean_up() {
}

Because in C++98 target we do not want to use any smart pointers because they can absence, we can, however, create a custom, very limited smart pointer type, which will only exist to delete memory allocated for object if its _read() method throw. Then the debug_array_user_t::_read() would be implemented as:

void debug_array_user_t::_read() {
    m_one_cat = new cat_t(m__io, this, m__root);
    m_one_cat->_read();
    m_array_of_cats = new std::vector<cat_t*>();
    const int l_array_of_cats = 3;
    for (int i = 0; i < l_array_of_cats; i++) {
        safe_read_helper<cat_t> _t_array_of_cats_helper = safe_read_helper<cat_t>(new cat_t(m__io, this, m__root));
        cat_t* _t_array_of_cats = _t_array_of_cats_helper.read();
        m_array_of_cats->push_back(_t_array_of_cats);
    }
}

The helper is very simple and limited:

/// Will work only with concrete types
/// Live only on stack. Destroyed just after calling `read()`
template<class T>
class safe_read_helper {
  T* m_ptr;
public:
  safe_read_helper(T* ptr) : m_ptr(ptr) {}
  ~safe_read_helper() { delete this->m_ptr; }

  T* read() {
    T* ptr = this->m_ptr;
    ptr->_read();
    this->m_ptr = 0;
    return ptr;
  }
}
generalmimon commented 3 months ago

@Mingun:

This is the common problem: in debug mode reading is performed after placing new object into container, but in no debug mode it is readed in constructor, i. e. before placing into container. That is very unpleasant situation, because debug mode ideally should not change behavior.

The "change in behavior" is a valid point, though the difference is subtle and it most likely won't matter to the vast majority of .ksy specs: if some expression in the .ksy uses items.size (or items.last) before items is fully parsed, it would include the new to-be-parsed element in --no-auto-read mode, but not in the default autoRead mode. If some spec relies on this, it should really use _index, though.

Otherwise, if the .ksy spec doesn't use items.size during parsing items, I don't think there is any observable change. In the case of successful parsing (when no error is thrown), the object trees will be the same regardless of whether autoRead is true or false. In case of an unsuccessful parsing, the default autoRead mode won't give any object tree (it will deliver the exception instead) and --no-auto-read will give some (partial) tree. This comes from the very nature of autoRead, there's no "unpleasant" change in behavior.

I think, the best solution is to move read just after creating object, but then this can create a memory leak in C++98 target if _read() of an array item will throw.

It used to be this way in 0.10, but it was intentionally changed in https://github.com/kaitai-io/kaitai_struct_compiler/commit/755bcbe1ba958469d4c40d56838e98fb4c6c121b. It makes the new Web IDE feature introduced in https://github.com/kaitai-io/kaitai_struct_webide/pull/170 much more useful, because it allows displaying the last partially parsed object of arrays. Without this compiler change, chances are that the Web IDE with complex formats could only display objects that are parsed completely fine before reaching the object whose parsing failed (arbitrarily deep in this object), which will probably not help in diagnosing the problem - you want to get as close as possible to the error location.

Unfortunately, it indeed breaks the DebugArrayUser test for C++11. At worst, we can limit the scope of the change in https://github.com/kaitai-io/kaitai_struct_compiler/commit/755bcbe1ba958469d4c40d56838e98fb4c6c121b to JavaScript only. But as you note, it creates a memory leak in C++98 if _read() fails (which is actually the same situation where we're most interested in this partial object in JavaScript).

To me, the obvious fix for the invalid memory access problem in C++11 would be to accept that the object has been moved into the std::vector and access it via the vector, like this (untested):

    for (int i = 0; i < l_array_of_cats; i++) {
        std::unique_ptr<cat_t> _t_array_of_cats = std::unique_ptr<cat_t>(new cat_t(m__io, this, m__root));
        m_array_of_cats->push_back(std::move(_t_array_of_cats));
        m_array_of_cats->back()->_read(); // instead of _t_array_of_cats->_read();
    }

Or if we came to a conclusion that the "change in behavior" is a problem, we could do something along these lines (m_array_of_cats->push_back(...); apparently must be duplicated, because it's not my fault that C++ doesn't support try..finally):

    for (int i = 0; i < l_array_of_cats; i++) {
        std::unique_ptr<cat_t> _t_array_of_cats = std::unique_ptr<cat_t>(new cat_t(m__io, this, m__root));
        try {
            _t_array_of_cats->_read();
        } catch(...) {
            m_array_of_cats->push_back(std::move(_t_array_of_cats));
            throw;
        }
        m_array_of_cats->push_back(std::move(_t_array_of_cats));
    }

And I would be happy with this even in JavaScript, where I need to ensure that the partial objects are accessible because of the Web IDE. Perhaps this is indeed a better solution.

Mingun commented 3 months ago

I totally agree that partial parsing is very useful feature and it shouldn't be removed in order to fix bug. Any solution that keeps it would be better. Another way is to use RTTI helper in C++11 code that will push_back into array it its destructor.