rpav / c2ffi

Clang-based FFI wrapper generator
GNU Lesser General Public License v2.1
234 stars 39 forks source link

Add support for wide string literals in macros #80

Closed skissane closed 3 years ago

skissane commented 3 years ago

The macro handling code doesn't recognise wide string literals.

As a result, it guesses them as type const char* instead of the correct type const wchar_t*.

As an example, the Windows SDK <security.h> header file contains:

 #define MICROSOFT_KERBEROS_NAME_W   L"Kerberos"

The L"..." is syntax for a wide character string literal, of type wchar_t*.

This is very common syntax in Windows header files. On other platforms such as macOS and Linux, it is still valid C syntax but rather rarely used in practice. Those platforms prefer to use UTF-8 for Unicode string storage, which means that the traditional narrow-string char* suffices.

Note that this PR doesn't actually use wchar_t, instead it uses __WCHAR_TYPE__. That's a predefined macro which Clang (and GCC) define as the actual integer type being used as the wide character type (short or int or long). I did this because wchar_t is a typedef which is only defined if you include <wchar.h>, which (in Clang and GCC) defines it as typedef __WCHAR_TYPE__ wchar_t;. By using __WCHAR_TYPE__ instead of wchar_t, it works even if you didn't include <wchar.h>.

This PR fixes c2ffi so it guesses the type of the literal correctly in this case.

It also adds the necessary code to include the value of the JSON string literal in the JSON or s-expression output. Clang stores wide strings in memory either as UTF-16 or UTF-32, so we need to convert the wide string value to UTF-8 for output.

Also add a new option --wchar-size=N which can be used to control the size of wchar_t, supported values are either 1, 2 or 4 bytes. This is equivalent to passing the -fwchar-type option to Clang with an argument of either char, short or int (respectively).