rnpgp / rnp

RNP: high performance C++ OpenPGP library used by Mozilla Thunderbird
https://www.rnpgp.org
Other
194 stars 55 forks source link

C++ symbol leakage in shared object (librnp-0.so.0) #1414

Closed dkg closed 2 years ago

dkg commented 3 years ago

1135 pruned a bunch of symbol visibility from the shared object, which is great. But as I'm packaging for debian, I've generated a list of exposed symbols, which include several C++-mangled symbols I'm pretty sure you do not want to expose:

_Z14rnp_log_switchv
_ZNSt10unique_ptrIN5Botan12HashFunctionESt14default_deleteIS1_EED1Ev
_ZNSt10unique_ptrIN5Botan12HashFunctionESt14default_deleteIS1_EED2Ev
_ZNSt12_Destroy_auxILb0EE9__destroyIP25pgp_transferable_subkey_tEEvT_S4_
_ZNSt12_Destroy_auxILb0EE9__destroyIP25pgp_transferable_userid_tEEvT_S4_
_ZNSt20__uninitialized_copyILb0EE13__uninit_copyIPK25pgp_transferable_userid_tPS2_EET0_T_S7_S6_
_ZNSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE17_M_realloc_insertIJS5_EEEvN9__gnu_cxx17__normal_iteratorIPS5_S7_EEDpOT_
_ZNSt6vectorISt5arrayIhLm20EESaIS1_EE17_M_realloc_insertIJRKS1_EEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_
_ZNSt6vectorIhSaIhEE17_M_default_appendEm
_ZNSt6vectorIhSaIhEE17_M_realloc_insertIJRKhEEEvN9__gnu_cxx17__normal_iteratorIPhS1_EEDpOT_
_ZNSt8_Rb_treeI14pgp_pkt_type_tS0_St9_IdentityIS0_ESt4lessIS0_ESaIS0_EE24_M_get_insert_unique_posERKS0_
_ZNSt8_Rb_treeI14pgp_pkt_type_tS0_St9_IdentityIS0_ESt4lessIS0_ESaIS0_EE29_M_get_insert_hint_unique_posESt23_Rb_tree_const_iteratorIS0_ERKS0_
_ZZNSt8__detail18__to_chars_10_implImEEvPcjT_E8__digits

rnp_log_switch looks like it's just missing some extern "C" { declarations (and it's supposed to be part of the declared RNP_API even though it isn't listed in the published headers). But the others demangle into things that I don't recognize or understand.

It'd be great to avoid exporting these symbols because they can actually differ across architectures (depending on intrinsic variable size and definition), and because you probably don't want people to accidentally rely on them.

Using c++filt to de-mangle the extra symbols, they are:

std::unique_ptr<Botan::HashFunction, std::default_delete<Botan::HashFunction> >::~unique_ptr()
std::unique_ptr<Botan::HashFunction, std::default_delete<Botan::HashFunction> >::~unique_ptr()
void std::_Destroy_aux<false>::__destroy<pgp_transferable_subkey_t*>(pgp_transferable_subkey_t*, pgp_transferable_subkey_t*)
void std::_Destroy_aux<false>::__destroy<pgp_transferable_userid_t*>(pgp_transferable_userid_t*, pgp_transferable_userid_t*)
pgp_transferable_userid_t* std::__uninitialized_copy<false>::__uninit_copy<pgp_transferable_userid_t const*, pgp_transferable_userid_t*>(pgp_transferable_userid_t const*, pgp_transferable_userid_t const*, pgp_transferable_userid_t*)
void std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_realloc_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&)
void std::vector<std::array<unsigned char, 20ul>, std::allocator<std::array<unsigned char, 20ul> > >::_M_realloc_insert<std::array<unsigned char, 20ul> const&>(__gnu_cxx::__normal_iterator<std::array<unsigned char, 20ul>*, std::vector<std::array<unsigned char, 20ul>, std::allocator<std::array<unsigned char, 20ul> > > >, std::array<unsigned char, 20ul> const&)
std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append(unsigned long)
void std::vector<unsigned char, std::allocator<unsigned char> >::_M_realloc_insert<unsigned char const&>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char const&)
std::_Rb_tree<pgp_pkt_type_t, pgp_pkt_type_t, std::_Identity<pgp_pkt_type_t>, std::less<pgp_pkt_type_t>, std::allocator<pgp_pkt_type_t> >::_M_get_insert_unique_pos(pgp_pkt_type_t const&)
std::_Rb_tree<pgp_pkt_type_t, pgp_pkt_type_t, std::_Identity<pgp_pkt_type_t>, std::less<pgp_pkt_type_t>, std::allocator<pgp_pkt_type_t> >::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator<pgp_pkt_type_t>, pgp_pkt_type_t const&)
std::__detail::__to_chars_10_impl<unsigned long>(char*, unsigned int, unsigned long)::__digits
dkg commented 3 years ago

1415 and #1416 are two different ways to address the part of this issue that covers rnp_log_switch -- but that's a very small part of the issue here. I'm not sure how to get rid of the remaining symbols.

dkg commented 3 years ago

in 0.15.0, _Z14rnp_log_switchv has gone away (as has rnp_log_switch itself) but two additional symbols are leaked:

_ZNSt6vectorISt5arrayIhLm20EESaIS1_EE7reserveEm@Base
_ZNSt6vectorISt5arrayIhLm20EESaIS1_EE8_M_eraseEN9__gnu_cxx17__normal_iteratorIPS1_S3_EE@Base
dkg commented 3 years ago

Can anyone upstream confirm (or reject) this leakage, either on debian or on other platforms? I'm sort of talking to myself here, and i'd like to try to figure out what the issue is.

ni4 commented 3 years ago

@dkg Oops, sorry for not following up. I see the same issue on macOS, but absolutely unsure where it comes from. Didn't dig deep into since is focused on other tasks. Maybe @dewyatt have some idea?

dkg commented 3 years ago

@ni4 thanks for checking in! it looks to me like these leakages are happening because some part of the library interface implicitly refers to C++ somehow. for example, from 0.14.0 to 0.15.0, we're now exporting std::vector<std::array<unsigned char, 20ul>, std::allocator<std::array<unsigned char, 20ul> > >::reserve(unsigned long)@Base -- and at the same time, src/lib/pgp-key.h added this line:

size_t              del_sigs(const std::vector<pgp_sig_id_t> &sigs);
ni4 commented 3 years ago

I have a feeling that symbols from librnp-obj are exported somehow (but still I may be completely wrong). At least this post has something about it: https://anadoxin.org/blog/control-over-symbol-exports-in-gcc.html/

ni4 commented 3 years ago

Update: within current codebase (98b9be2d58f8ea8634c8429311bfd89c32e63205), only the following additional symbols are shown via nm -gDC --defined-only librnp.so on centos8:

000000000004bc40 W std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > __gnu_cxx::__to_xstring<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char>(int (*)(char*, unsigned long, char const*, __va_list_tag*), unsigned long, char const*, ...)
0000000000027790 W void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag)
0000000000027990 W std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append(unsigned long)
0000000000027780 W std::vector<unsigned char, std::allocator<unsigned char> >::~vector()
000000000002ee20 W void std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_realloc_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&)
000000000005e870 W std::vector<std::array<unsigned char, 20ul>, std::allocator<std::array<unsigned char, 20ul> > >::operator=(std::vector<std::array<unsigned char, 20ul>, std::allocator<std::array<unsigned char, 20ul> > > const&)
00000000002b1528 V typeinfo for Botan::Cipher_Mode
00000000002b1518 V typeinfo for Botan::SymmetricAlgorithm
0000000000092565 V typeinfo name for Botan::Cipher_Mode
000000000009257b V typeinfo name for Botan::SymmetricAlgorithm

Switching backend to OpenSSL botan-related ones go away.

dkg commented 2 years ago

This is causing build failures on debian (click on the "Maybe-failed" links to see one of the build logs). This happens because debian tries to track the symbols exported by any package that ships a shared library.

On different architectures with different data types, a mangled C++ function entry point will map to different symbol name (because the mangling takes into account the types of the arguments).

So for example, on arm64 we see:

dpkg-gensymbols: error: some symbols or patterns disappeared in the symbols file: see diff output below
dpkg-gensymbols: warning: debian/librnp0/DEBIAN/symbols doesn't match completely debian/librnp0.symbols
--- debian/librnp0.symbols (librnp0_0.15.2-2_arm64)
+++ dpkg-gensymbolsUS5XKW   2021-10-29 01:54:38.409908459 +0000
@@ -1,7 +1,7 @@
 librnp.so.0 librnp0 #MINVER#
 * Build-Depends-Package: librnp-dev
- _ZNSt10unique_ptrIN5Botan12HashFunctionESt14default_deleteIS1_EED1Ev@Base 0.14.0
- _ZNSt10unique_ptrIN5Botan12HashFunctionESt14default_deleteIS1_EED2Ev@Base 0.14.0
+#MISSING: 0.15.2-2# _ZNSt10unique_ptrIN5Botan12HashFunctionESt14default_deleteIS1_EED1Ev@Base 0.14.0
+#MISSING: 0.15.2-2# _ZNSt10unique_ptrIN5Botan12HashFunctionESt14default_deleteIS1_EED2Ev@Base 0.14.0
  _ZNSt12_Destroy_auxILb0EE9__destroyIP25pgp_transferable_subkey_tEEvT_S4_@Base 0.14.0
  _ZNSt12_Destroy_auxILb0EE9__destroyIP25pgp_transferable_userid_tEEvT_S4_@Base 0.14.0
  _ZNSt20__uninitialized_copyILb0EE13__uninit_copyIPK25pgp_transferable_userid_tPS2_EET0_T_S7_S6_@Base 0.14.0
dh_makeshlibs: error: failing due to earlier errors

I'd really like to get rnp into debian; but i'm struggling with this, and i'm not sure i understand why these symbols are still being exported, or how to avoid their being exported.

dkg commented 2 years ago

On further inspection, it looks like some of these symbols might depend on the version of botan that is being bound to as well, which is itself potentially problematic.

Is it fair to say that rnp has no intention of exporting any C++ symbols? Is there any reason that a user of rnp should need to use C++ when interacting with librnp?

ni4 commented 2 years ago

Thanks for following up with this.

Is it fair to say that rnp has no intention of exporting any C++ symbols? Is there any reason that a user of rnp should need to use C++ when interacting with librnp?

Absolutely, we do not need to export those in any way, only C rnp_* ones. However I wasn't able to clean things up, have no idea how those comes up visible and exported. I will continue to investigate it, but right now is out of any ideas.

@ronaldtse @dewyatt Maybe you have any idea or hints in this?

ni4 commented 2 years ago

Not sure whether it is the only possible solution, but managed things to work on centos8 container via --version-script:

CODEABI_0.15.2 {
    global: *rnp_*;
    local: *;
};

Now output is as following:

[root@ee62582d45c9 /]# nm -DC --defined-only /usr/local/lib64/librnp.so
0000000000000000 A CODEABI_0.15.2
0000000000089f00 T rnp_backend_string
0000000000089f10 T rnp_backend_version
000000000007ac00 T rnp_buffer_clear
000000000007abf0 T rnp_buffer_destroy
0000000000064cd0 T rnp_calculate_iterations
0000000000089970 T rnp_dearmor
0000000000070ab0 T rnp_decrypt
0000000000064840 T rnp_detect_homedir_info
0000000000064c40 T rnp_detect_key_format
0000000000064620 T rnp_disable_debug
00000000000885c0 T rnp_dump_packets_to_json
0000000000088790 T rnp_dump_packets_to_output
0000000000064610 T rnp_enable_debug
0000000000089690 T rnp_enarmor
...

@dkg Is this output is good enough? (still, needs to find out how to make this work on macOS, and check CODEABI things).

dkg commented 2 years ago

Not sure whether it is the only possible solution, but managed things to work on centos8 container via --version-script:

CODEABI_0.15.2 {
  global: *rnp_*;
  local: *;
};

This is an interesting proposal. maybe the global line should be rnp_* - do you need the leading * ? I don't think you want to export symbols that just happen to have rnp_ anywhere in the name.

I'm also not confident in the choice of version tag: CODEABI_0.15.2 seems not right, but i dont really understand the version tags well. I think section 3 of Drepper's "How To Write Shared Libraries" is a useful reference, but i haven't wrapped my head around all of it. First, you should use RNP_ or LIBRNP_ instead of CODEABI_.

But I think if you introduce a version tag like this, its introduction might actually break ABI compatibility (because for example the symbol name goes from rnp_dearmor to rnp_dearmor@@RNP_0.15.2, iiuc)

Maybe the version-script could start with no version tag at all, and then as you add symbols to the library you can introduce the new symbols with a version tag. Then, at the next planned ABI breakage (aligned with an SONAME bump) you could consolidate all the new symbols into a new version tag.

The simplest version script should be one that doesn't include a version tag at all (leaving the version tag as the "Base" value):

{
 global rnp_*;
 local *;
}

I've added this to the debian build and it does seem to filter the exposed symbols correctly, without changing the version tag.

If y'all do something comparable upstream, i'd be happy to drop the debian-specific version script.

dkg commented 2 years ago

ah, looking at #1667 it looks like you've arrived at the same conclusion for the version-script, that's great! (i can't vouch for the macOS variant though)

ni4 commented 2 years ago

@dkg didn't manage to read the whole Drepper's article yet, but even first pages gives a lot of new information. Let's drop version tag for now as FFI API is not planned to be changed any time soon (except new functions/other functions deprecation).

If y'all do something comparable upstream, i'd be happy to drop the debian-specific version script.

Most likely we'll merge and include changes to the v0.16.0, which still require some other issues to be fixed before releasing (see https://github.com/rnpgp/rnp/milestone/7 )