Open Chris-166 opened 1 year ago
@Chris-166 Could you provide reproducible colcon project to make this problem happen? having android system is hard for us, that would be really appreciated.
@fujitatomoya We are currently not reproducing this issue on the colcon project.
We want to change the "dynamic_cast" in the following code to "static_cast". Because we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type.
topic_holder->topic = dynamic_cast<eprosima::fastdds::dds::Topic *>(desc); // **dynamic_cast fail**
after modification:
topic_holder->topic = static_cast<eprosima::fastdds::dds::Topic *>(desc);
Are there any risks with this modification?Please help to evaluate, Thanks.
@fujitatomoya For the "dynamic_cast fail", do you have any debugging directions? Could it be caused by differences in the C++ standard library used by the NDK?
The assert
is just for the debug
, I guess the rmw_fastrtps
(/system/lib64/librmw_fastrtps_cpp.so) library that you used is built in release
mode.
If the dynamic_cast return nullptr in release
, the assert
is ignored and then the cast_or_create_topic returns true with topic_holder->topic
(nullptr), and it continues to call Publisher::create_datawriter, which leads to null pointer dereference
.
I think there are two ways to fix it.
cast_or_create_topic
in rmw_fastrtps
to not use assert, but if condition.NOTE: Without enough information, it's hard to know why the dynamic_cast
failed.
@iuhilnehc-ynos Null pointer protection can only prevent the program from crashing, but the program function will be affected. I want to change "dynamic_cast" to "static_cast",the reason is as follows:
"NOTE: Without enough information, it's hard to know why the dynamic_cast failed." -> Yes, I think so. And what information can I provide if further analysis is required? Thanks.
I want to change "dynamic_cast" to "static_cast"
Currently, it's OK.
but it seems dangerous because static_cast for a pointer from TopicDescription*
into Topic*
can't promise it's a pointer of Topic
.
e.g., there might be a method that is overridden from DomainEntity
or a new method belonging to Topic
in the future called inside PublisherImpl::create_datawriter, if so and the type of desc
is not Topic
but a new class derived from TopicDescription
, it could cause crash again.
we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;
It seems they're the correct type.
what information can I provide if further analysis is required
I am not sure if you build some libraries with -fno-rtti
.
we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;
Oh, I see you used the typeid
to print the type, which means the RTTI
is not disabled.
Sorry, I don't know why the dynamic_cast
failed in such a way.
topic_holder->topic = dynamic_cast<eprosima::fastdds::dds::Topic *>(desc);
this should be no problem, it can downcast to eprosima::fastdds::dds::Topic
.
we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;
this only means that it still can access to desc
at this moment? but desc
object could be null after?
this only means that it still can access to
desc
at this moment? butdesc
object could be null after?
Exception Cases: (Add Logs)
LOGE("utils.cpp#cast_or_create_topic, ready to dynamic_cast , desc type is %s", typeid(*desc).name());
topic_holder->topic = dynamic_cast<eprosima::fastdds::dds::Topic *>(desc);
LOGE("utils.cpp#cast_or_create_topic, after dynamic_cast , desc type is %s", typeid(*desc).name());
assert(nullptr != topic_holder->topic);
2023-06-01 19:01:15.222 7052-7052 E/rmw_fastrtps_shared_cpp: utils.cpp#cast_or_create_topic, desc name = rt/chatter, type = stdmsgs::msg::dds::String_ 2023-06-01 19:01:15.222 7052-7052 E/rmw_fastrtps_shared_cpp: utils.cpp#cast_or_create_topic, ready to dynamic_cast , desc type is N8eprosima7fastdds3dds5TopicE 2023-06-01 19:01:15.222 7052-7052 E/rmw_fastrtps_shared_cpp: utils.cpp#cast_or_create_topic, after dynamic_cast , desc type is N8eprosima7fastdds3dds5TopicE
I'd like to share two links with you.
https://developer.android.com/ndk/guides/common-problems#rttiexceptions_not_working_across_library_boundaries https://github.com/android/ndk/issues/533#issuecomment-335977747
Maybe you need to update the TopicDescription::~TopicDescription
in a new file src/cpp/fastdds/topic/TopicDescription.cpp
, and the src/cpp/CMakeLists.txt
.
I am not sure, I didn't test it.
Thank you for your reply!
But I did not understand the description of this exception. Could you please provide me with the patch first to directly verify your doubts?
I want to change "dynamic_cast" to "static_cast"
Currently, it's OK. but it seems dangerous because static_cast for a pointer from
TopicDescription*
intoTopic*
can't promise it's a pointer ofTopic
. e.g., there might be a method that is overridden fromDomainEntity
or a new method belonging toTopic
in the future called inside PublisherImpl::create_datawriter, if so and the type ofdesc
is notTopic
but a new class derived fromTopicDescription
, it could cause crash again.we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;
It seems they're the correct type.
what information can I provide if further analysis is required
I am not sure if you build some libraries with
-fno-rtti
.
``
I want to change "dynamic_cast" to "static_cast"
Currently, it's OK. but it seems dangerous because static_cast for a pointer from
TopicDescription*
intoTopic*
can't promise it's a pointer ofTopic
. e.g., there might be a method that is overridden fromDomainEntity
or a new method belonging toTopic
in the future called inside PublisherImpl::create_datawriter, if so and the type ofdesc
is notTopic
but a new class derived fromTopicDescription
, it could cause crash again.we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;
It seems they're the correct type.
what information can I provide if further analysis is required
I am not sure if you build some libraries with
-fno-rtti
.
Note:The complete compilation script is as follows:
export PYTHON3_EXEC="$( which python3 )"
export PYTHON3_LIBRARY="$( ${PYTHON3_EXEC} -c 'import os.path; from distutils import sysconfig; print(os.path.realpath(os.path.join(sysconfig.get_config_var("LIBPL"), sysconfig.get_config_var("LDLIBRARY"))))' )"
export PYTHON3_INCLUDE_DIR="$( ${PYTHON3_EXEC} -c 'from distutils import sysconfig; print(sysconfig.get_config_var("INCLUDEPY"))' )"
export ANDROID_ABI=arm64-v8a
export ANDROID_TARGET=29
export ANDROID_NATIVE_API_LEVEL=android-29
export ANDROID_TOOLCHAIN_NAME=aarch64-linux-android-clang
colcon build \
--packages-ignore cyclonedds rcl_logging_log4cxx rcl_logging_spdlog rosidl_generator_py rclandroid ros2_talker_android ros2_listener_android \
--cmake-args \
-DENABLE_LTTNG=OFF \
-DTRACETOOLS_DISABLED=ON \
-DCMAKE_VERBOSE_MAKEFILE=ON \
-DPYTHON_EXECUTABLE=${PYTHON3_EXEC} \
-DPYTHON_LIBRARY=${PYTHON3_LIBRARY} \
-DPYTHON_INCLUDE_DIR=${PYTHON3_INCLUDE_DIR} \
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
-DANDROID=ON \
-DANDROID_FUNCTION_LEVEL_LINKING=OFF \
-DANDROID_NATIVE_API_LEVEL=${ANDROID_TARGET} \
-DANDROID_TOOLCHAIN_NAME=${ANDROID_TOOLCHAIN_NAME} \
-DANDROID_STL=c++_shared \
-DANDROID_ABI=${ANDROID_ABI} \
-DANDROID_NDK=${ANDROID_NDK} \
-DTHIRDPARTY=ON \
-DCOMPILE_EXAMPLES=OFF \
-DCMAKE_FIND_ROOT_PATH="${PWD}/install" \
-DBUILD_TESTING=OFF \
-DRCL_LOGGING_IMPLEMENTATION=rcl_logging_noop \
-DTHIRDPARTY_android-ifaddrs=FORCE
But I did not understand the description of this exception. Could you please provide me with the patch first to directly verify your doubts?
I guess you used the branch 2.8.x
of Fast-DDS, which is mentioned in https://github.com/ros2/rmw_fastrtps/issues/696#issue-1754774213 (https://github.com/eProsima/Fast-DDS/blob/2.8.x/src/cpp/fastdds/publisher/PublisherImpl.cpp
), the patch is based on the latest commit of branch 2.8.x
. Please help to check whether it can fix the dynamic is failing
issue or not.
Note:The complete compilation script is as follows:
Thank you. It's out of my scope, so I am not going to build it on my local machine.
But I did not understand the description of this exception. Could you please provide me with the patch first to directly verify your doubts?
I guess you used the branch
2.8.x
of Fast-DDS, which is mentioned in #696 (comment) (https://github.com/eProsima/Fast-DDS/blob/2.8.x/src/cpp/fastdds/publisher/PublisherImpl.cpp
), the patch is based on the latest commit of branch2.8.x
. Please help to check whether it can fix thedynamic is failing
issue or not.
Using this patch, dynamic is failing
issue can still be reproduced.
@Chris-166 we are not using android to our platform, and which is not officially supported with ROS 2. (https://docs.ros.org/en/rolling/Releases/Release-Rolling-Ridley.html)
you can keep this issue open I guess, but we are not gonna be able to help you out on this soon.
@Chris-166 friendly ping, otherwise i would like to close this issue since Android is not supported platform.
Bug report
11149 11149 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x70 in tid 11149 06-01 19:19:03.665 11194 11194 F DEBUG : 06-01 19:19:03.665 11194 11194 F DEBUG : Revision: '0' 06-01 19:19:03.665 11194 11194 F DEBUG : ABI: 'arm64' 06-01 19:19:03.665 11194 11194 F DEBUG : Timestamp: 2023-06-01 19:19:03+0800 06-01 19:19:03.665 11194 11194 F DEBUG : uid: 10148 06-01 19:19:03.665 11194 11194 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x70 06-01 19:19:03.665 11194 11194 F DEBUG : Cause: null pointer dereference 06-01 19:19:03.665 11194 11194 F DEBUG : x0 000000705a8a42e8 x1 0000000000000068 x2 0000007fd0efc0c0 x3 00000070fa842f48 06-01 19:19:03.665 11194 11194 F DEBUG : x4 0000007fd0efc0a0 x5 0000007fd0efc0a9 x6 726574746168632f x7 726574746168632f 06-01 19:19:03.665 11194 11194 F DEBUG : x8 0000000000000000 x9 0000000000000000 x10 0000000000000001 x11 0000000000000000 06-01 19:19:03.665 11194 11194 F DEBUG : x12 7fffffffffffffff x13 fffffffffc000000 x14 0000000000000060 x15 0000000000400000 06-01 19:19:03.665 11194 11194 F DEBUG : x16 00000070666a6c08 x17 000000706638afc4 x18 00000070fb1fc000 x19 000000705a8a4300 06-01 19:19:03.665 11194 11194 F DEBUG : x20 0000007fd0efbe58 x21 000000705a8a3c00 x22 0000000000000068 x23 00000070fa842f48 06-01 19:19:03.665 11194 11194 F DEBUG : x24 0000007fd0efc0c0 x25 00000070faa9b020 x26 0000000000000001 x27 000000706af5f2c8 06-01 19:19:03.665 11194 11194 F DEBUG : x28 00000070fa780fe0 x29 0000007fd0efbdb0
---> [initial analysis]
https://github.com/eProsima/Fast-DDS/blob/2.8.x/src/cpp/fastdds/publisher/PublisherImpl.cpp DataWriter PublisherImpl::create_datawriter( Topic topic, const DataWriterQos& qos, DataWriterListener* listener, const StatusMask& mask) { logInfo(PUBLISHER, "CREATING WRITER IN TOPIC: " << topic->get_name()); // topic is null //Look for the correct type registration TypeSupport typesupport = participant->find_type(topic->get_type_name()); ... }
bool cast_or_create_topic( eprosima::fastdds::dds::DomainParticipant participant, eprosima::fastdds::dds::TopicDescription desc, const std::string & topic_name, const std::string & type_name, const eprosima::fastdds::dds::TopicQos & topic_qos, bool is_writer_topic, TopicHolder * topic_holder) { ...
... }