haxscramper / hcparse

High-level nim bindings for parsing C/C++ code
https://haxscramper.github.io/hcparse-doc/src/hcparse/libclang.html
Apache License 2.0
37 stars 2 forks source link

Inner typedef handling #3

Open haxscramper opened 3 years ago

haxscramper commented 3 years ago

C++ allows for nested structures, classes, unions and typedef declarations, which are often used in template metaprogramming. For example, consider definitions of the std::basic_string (std::string is a typedef of this class). It has several inner declarations for various types, such as ::const_pointer, ::const_iterator and more. I will not go over all of them, and instead will focus on one, that recently triggered another edge case in the implementation.

// typedef for const string pointer
typedef typename _Alloc_traits::const_pointer   const_pointer;
// typedef for string iterator
typedef __gnu_cxx::__normal_iterator<const_pointer, basic_string> const_iterator;

An object of this type is returned by cbegin() method, and some other elements. basic_string implementation does not rely too much on this iterator, and it is mainly interesting because of how it is implemented. If we go to bits/stl_iterator.h, and find __normal_iterator implementation, we could see this (unimportant parts are omitted, full implementation can be found at stdc++ mirror):

  // This iterator adapter is @a normal in the sense that it does not
  // change the semantics of any of the operators of its iterator
  // parameter.  Its primary purpose is to convert an iterator that is
  // not a class, e.g. a pointer, into an iterator that is a class.
  // The _Container parameter exists solely so that different containers
  // using this template can instantiate different types, even if the
  // _Iterator parameter is the same.
  template<typename _Iterator, typename _Container>
    class __normal_iterator
    {
    protected:
      _Iterator _M_current;

      typedef std::iterator_traits<_Iterator>       __traits_type;

    public:
      typedef typename __traits_type::reference     reference;

      // Forward iterator requirements
      _GLIBCXX20_CONSTEXPR
      reference
      operator*() const _GLIBCXX_NOEXCEPT
      { return *_M_current; }

Question - what is the type of *std::basic_string<char>.cbegin()? We get __normal_iterator from cbegin() and then dereference it via operator* ... whose return type is reference. Which has nothing to do with any of the template type parameters for any of the involved types, at least not directly. The return type of the operator* is defined in terms of inner typedef for a template parameter. Specifically ::reference, which is a __traits_type::reference, where _traits_type = std::iterator_traits<_Iterator>, where _Iterator is a template type parameter.

I've come with a couple ways of solving this problem, none of which are satisfactory.

  1. Manually override return type of the cbegin() with CxxIterator[char], which has [] implementation. Could work, does not scale well since I would have to manually (or semi-automatically) correct a lot of code.
  2. Wrap inner typedefs as well (right now they are ignored), and implement [] overload. This is problematic, since inner typedef, as we have already seen, does not provide enough information to reconstruct the same behavior in nim code.
    1. I cannot put typedef inside of nim type declaration, which means it must a separate object, to allow for std::basic_string<'0, '1, '2>::const_pointer (because no way in hell I I will be able to decipher the whole typedef chain - (1) const_pointer is a _Alloc_traits::const_pointer where (2) _Alloc_traits is a __gnu_cxx::__alloc_traits<_Char_alloc_type> where (3) _Char_alloc_type is a __gnu_cxx::__alloc_traits<_Alloc>::template rebind<_CharT>::other where (4) _CharT and _Alloc are actual template type parameters. Wrapping this would mean jumping into internal implementation details that are clearly not needed*)
      • New object might work, but would require a very convoluted setup that I could not manage to get working
  CxxBase {.inheritable, pure.} = object
  CxxNormalIteratorBase[It, Container] = object of CxxBase
  CxxPointerBase[P] = object of CxxBase

  StdBasicStringConstPointer[CharT, Traits, Alloc] {.
    importcpp: "std::basic_string<'0, '1, '2>::const_pointer",
    header: "<string>",
    byref
  .} = object of CxxPointerBase[CharT]

  StdBasicStringConstIterator[CharT, Traits, Alloc] {.
    importcpp: "std::basic_string<'0, '1, '2>::const_iterator",
    header: "<string>",
    byref
  .} = object of
    CxxNormalIteratorBase[
      StdBasicStringConstPointer[CharT, Traits, Alloc],
      StdBasicString[CharT, Traits, Alloc]
    ]

byref, CxxBase etc. is needed in order to be able to reuse base iterator methods. Otherwise, I have to copy object methods for each instantiation, which is even worse than #1, where it was capped at number of derived classes, while this would be repeated for each inner typedef.

For the reference, basic_string has 14 - (traits_type, value_type, allocator_type, size_type, difference_type, reference, const_reference, pointer, const_pointer, iterator, basic_string,const_reverse_iterator,reverse_iterator,const_iterator`)

Then, I need to implement an absolute abomination of a procedure, that does two unpacking of the template parameters in order to get to the char one:

proc `[]`[P](it: CxxPointerBase[P]): P {.importcpp: "*#".}
proc `[]`[It, Container](it: CxxNormalIteratorBase[It, Container]): auto =
  if false:
    # Would fail for non-default-initalizable objects
    var tmp: It
    # Due to `if` `result = *tmp; goto BeforeRet_;` is generated, which means this does not
    # work with non-copyable objects. I can `.emit.` `#if false` around this section, but
    # this is an absolutely gross hack.
    return tmp[]

  else:
    {.emit: "return *`it`;".}

Maybe there are more solutions that I'm not aware of, but this is a showstopper for wrapping things like C++ stdlib.

``` import std/typetraits type StdCharTraits[CharT] {.importcpp: "std::char_traits".} = object StdAllocator[Alloc] {.importcpp: "std::allocator", header: "".} = object StdBasicString[CharT, Traits, Alloc] {. importcpp: "std::basic_string", header: ""} = object CxxBase {.inheritable, pure.} = object CxxNormalIteratorBase[It, Container] = object of CxxBase CxxPointerBase[P] = object of CxxBase StdBasicStringConstPointer[CharT, Traits, Alloc] {. importcpp: "std::basic_string<'0, '1, '2>::const_pointer", header: "", byref .} = object of CxxPointerBase[CharT] StdBasicStringConstIterator[CharT, Traits, Alloc] {. importcpp: "std::basic_string<'0, '1, '2>::const_iterator", header: "", byref .} = object of CxxNormalIteratorBase[ StdBasicStringConstPointer[CharT, Traits, Alloc], StdBasicString[CharT, Traits, Alloc] ] StdString = StdBasicString[char, StdCharTraits[char], StdAllocator[char]] proc cbegin[C, T, A](str: StdBasicString[C, T, A]): StdBasicStringConstIterator[C, T, A] {.importcpp: "#.cbegin()".} proc `+=`(s: var StdString, other: cstring) {.importcpp: "#.operator+=(@)".} proc `[]`[P](it: CxxPointerBase[P]): P {.importcpp: "*#".} proc `[]`[It, Container](it: CxxNormalIteratorBase[It, Container]): auto = if false: {.emit: "\n#if false".} var tmp: It static: echo typeof tmp echo typeof tmp[] return tmp[] {.emit: "\n#endif".} else: {.emit: "return *`it`;".} proc iostdAux() {.header: "", importcpp: "//".} proc test[C, T, A](it: StdBasicString[C, T, A]): StdBasicStringConstIterator[C, T, A] = it.cbegin() proc main() = var str: StdString str += "01234".cstring var iter = str.test() {.emit: """ std::cout << *`iter` << "\n"; std::cout << *`iter` << "\n"; std::cout << *`str`.cbegin() << "\n"; """.} let ch: char = iter[] echo "[[", ch, "]]" ioStdAux() main() ``` fails with ``` @mwrap_inner_typedef.nim.cpp: In function ‘NIM_CHAR X5BX5D___OkoHvlMD9coCKZ51lOl2IrQ(tyObject_CxxNormalIteratorBase__7illd9ankfyVtnNdKKmSilQ*)’: @mwrap_inner_typedef.nim.cpp:260:24: error: cannot convert ‘tyObject_CxxNormalIteratorBase__7illd9ankfyVtnNdKKmSilQ’ to ‘NIM_CHAR’ {aka ‘char’} in return 260 | return *it; | ^~~ | | | tyObject_CxxNormalIteratorBase__7illd9ankfyVtnNdKKmSilQ ```
timotheecour commented 3 years ago

can you simplify to the maximum and show a reduced case with no reference to C++ stdlib?

this will make analysis easier to find a good solution

haxscramper commented 3 years ago
const h = "wrap_inner.hpp"

type
  P[T] {.importcpp: "P<'0>", header: h.} = object
    f: T

  S[T] {.importcpp: "S<'0>", header: h.} = object

  # Have to declare additional object, cannot port procedures/fields of
  # `P`. When compiled to C++ this type /is/ actually a `P[T]`
  S_value_type[T] {.importcpp: "S<'0>::value_type", header: h.} = object
    f: T # In order for `echo` to work correctly I need to port all fields.

proc get[T](s: S[T], arg: T): S_value_type[T] {.importcpp: "#.get(@)", header: h.}

proc initS[T](): S[T] {.importcpp: "S<'*0>()", header: h.}

proc main() =
  let s = initS[cint]().get(1)
  echo s

main()
template <typename T>
struct P {
    T f;
};

template <typename T>
struct S {
    typedef P<T> value_type;

    value_type get(T arg) {
        return P<T>{arg};
    };
};