microsoft / azure_spatial_anchors_ros

ROS wrapper for the Azure Spatial Anchors Linux SDK, allowing robots (and other devices with vision-based sensors) to co-localize with other robots, AR-enabled phones, and Hololens devices.
https://azure.microsoft.com/en-us/services/spatial-anchors/
MIT License
86 stars 20 forks source link

Crash of "find_anchors" service when querying anchor IDs a few days old #26

Closed PDN-AUTDE closed 2 years ago

PDN-AUTDE commented 2 years ago

I've encountered an issue yesterday when querying anchors which were place a few days back using the _findanchors service. Using the hololens it worked fine. It was only when using the ROS SDK.

Using the ROS sdk I've queried the anchor ID's from the cloud using the TableService of the from azure.cosmosdb.table package which works fine. I then create a comma separated string including all IDs and send a ROS service request for _findanchors to the running instance of the _asa_rosnode which then crashed, leaving the following log in the message:

[INFO] [1645543136.273818]: Searching for ID 03c156dd-89ad-45d7-85d6-65ad556fd2e2 I0222 10:18:56.506862 11279 asa_interface.cpp:404] Starting to look for anchor ID: 03c156dd-89ad-45d7-85d6-65ad556fd2e2 *** Aborted at 1645543136 (unix time) try "date -d @1645543136" if you are using GNU date *** PC: @ 0x7f331db3ac38 Microsoft::Azure::SpatialAnchors::CloudSpatialAnchor::LocalAnchor() *** SIGSEGV (@0x10) received by PID 11279 (TID 0x7f32ca7f4700) from PID 16; stack trace: *** @ 0x7f331dd78980 (unknown) @ 0x7f331db3ac38 Microsoft::Azure::SpatialAnchors::CloudSpatialAnchor::LocalAnchor() @ 0x7f331db0864f _ZZN7asa_ros28AzureSpatialAnchorsInterface24queryAnchorsWithCallbackERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EERKSt8functionIFvRKS7_RKN5Eigen9TransformIdLi3ELi2ELi0EEEEEENKUlPvRKSt10shared_ptrIN9Microsoft5Azure14SpatialAnchors22AnchorLocatedEventArgsEEE_clESO_SW_.isra.1173 @ 0x7f331db40e01 _ZZN9Microsoft5Azure14Spatia

[INFO] [1645543136.273818]: Searching for ID 03c156dd-89ad-45d7-85d6-65ad556fd2e2 I0222 10:18:56.506862 11279 asa_interface.cpp:404] Starting to look for anchor ID: 03c156dd-89ad-45d7-85d6-65ad556fd2e2 *** Aborted at 1645543136 (unix time) try "date -d @1645543136" if you are using GNU date *** PC: @ 0x7f331db3ac38 Microsoft::Azure::SpatialAnchors::CloudSpatialAnchor::LocalAnchor() *** SIGSEGV (@0x10) received by PID 11279 (TID 0x7f32ca7f4700) from PID 16; stack trace: *** @ 0x7f331dd78980 (unknown) @ 0x7f331db3ac38 Microsoft::Azure::SpatialAnchors::CloudSpatialAnchor::LocalAnchor() @ 0x7f331db0864f _ZZN7asa_ros28AzureSpatialAnchorsInterface24queryAnchorsWithCallbackERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EERKSt8functionIFvRKS7_RKN5Eigen9TransformIdLi3ELi2ELi0EEEEEENKUlPvRKSt10shared_ptrIN9Microsoft5Azure14SpatialAnchors22AnchorLocatedEventArgsEEE_clESO_SW_.isra.1173 @ 0x7f331db40e01 _ZZN9Microsoft5Azure14Spatia

[front_asa_ros-1] process has died [pid 11279, exit code -11, cmd /home/administrator/STIHL_ASA_SPOT_AUTDE/50_Software-EdgeDomain/00_Spot_Core/01_catkin_ws/devel/lib/asa_ros/asa_ros_node image:=/camera/fisheye1_rect/image camera_info:=/camera/fisheye1_rect/camera_info __name:=front_asa_ros __log:=/home/administrator/.ros/log/a3916d12-93ec-11ec-9cec-a8a1595b2d28/front_asa_ros-1.log]. log file: /home/administrator/.ros/log/a3916d12-93ec-11ec-9cec-a8a1595b2d28/front_asa_ros-1*.log

[front_asa_ros-1] process has died [pid 11279, exit code -11, cmd /home/administrator/STIHL_ASA_SPOT_AUTDE/50_Software-EdgeDomain/00_Spot_Core/01_catkin_ws/devel/lib/asa_ros/asa_ros_node image:=/camera/fisheye1_rect/image camera_info:=/camera/fisheye1_rect/camera_info __name:=front_asa_ros __log:=/home/administrator/.ros/log/a3916d12-93ec-11ec-9cec-a8a1595b2d28/front_asa_ros-1.log]. log file: /home/administrator/.ros/log/a3916d12-93ec-11ec-9cec-a8a1595b2d28/front_asa_ros-1*.log

The nodes are all running on a SPOT Core and were occuring with and without any other nodes running (eg. Spot Driver, Realsense Driver). The crash happened immediately and each time after calling the _findanchors service. However, those errors did not occur, when querying an ID of a spatial anchor which was recently (same day

jeffdelmerico commented 2 years ago

A few follow-up questions:

PDN-AUTDE commented 2 years ago
jeffdelmerico commented 2 years ago

Thanks for the information. I don't think I've seen this before.

Can you provide more detailed logs than the one that you sent over email? It might help for us to see how you are launching the node, and what output preceded the crash.

You state that this occurs when calling the _findanchors service...does it also happen if you launch the node while providing the anchor id on startup (by setting the _anchorid parameter in the launch file)?

PDN-AUTDE commented 2 years ago

Thanks a lot for your quick responses! We don't have any more logs unfortunately as we deleted the troubling anchors and replaced them with new ones. However, I will try and make some more tests during the next couple of days and provide you with more info once the error occurs again.

We did not try to provide the anchor_id in the launch file. I will test that as well though, once the error occurs again.

jeffdelmerico commented 2 years ago

OK sounds good. Hopefully this was an isolated incident, but if it happens again, please share whatever information you can gather and we'll be happy to try to debug it. I will leave this issue open in the meantime.

PDN-AUTDE commented 2 years ago

Will do, thanks for your help so far.

RobertBlakeAnderson commented 2 years ago

Our understanding is that the Azure cloud will (under conditions that I don't think are visible to us end users) consider older anchors to be expired and thus delete them. Perhaps this is the culprit?

EricVoll commented 2 years ago

To our knowledge, ASA does not delete anchors. Many use-cases use anchors for much longer durations, so we'd be surprised if that happens. I personally used the same anchor for multiple weeks in a row, also using this ros wrapper. I also couldn't find any hint towards such a deletion behavior in the documentation. Did you read that somewhere on an official Microsoft page? If so, I'd be super thankful if you could point me to that so that I can dig deeper.

PDN-AUTDE commented 2 years ago

The deletion did not cause the issues, as we could still find the anchors in the database. Apart from that we could also locate them using the Hololens.

roalchaq commented 2 years ago

Maybe you could check your CreateAzureAnchor method (or equivalent). You might find a line such as this one:

localCloudAnchor.Expiration = DateTimeOffset.Now.AddDays(7);

This line sets an anchor to expire automatically after the indicated time.

PDN-AUTDE commented 2 years ago

That is correct, I found that line. Thanks for the hint!

jeffdelmerico commented 2 years ago

@roalchaq Thanks for catching this.

Setting the expiration for CloudAnchors is available in the API, but is not exposed in the ROS wrapper because as far as I can tell, the default is for the anchors to never expire. Indeed, this is the first instance that we're aware of where any anchors were "deleted", although the fact that they are still in the database and locatable from the HoloLens seems to indicate some other, as yet unexplained behavior. Despite my skepticism, please let us know if this resolves your problem.

Also, if anchor expiration is a desired feature, please feel free to submit a PR.