ros-acceleration / adaptive_component

A composable container for Adaptive ROS 2 Node computations. Select between FPGA, CPU or GPU at run-time.
Apache License 2.0
11 stars 5 forks source link

Question: Sharing of Parameters #2

Closed methylDragon closed 2 years ago

methylDragon commented 2 years ago

If I understood correctly, this adaptive component is a way to expose an interface to swap between node components dynamically.

What is the intended interaction with the parameter system for the implemented nodes for the different architectures? Are parameters meant to be bound to the adaptive component (essentially the node manager), or is the intention to have the parameters bound to the architecture specific nodes instead?

vmayoral commented 2 years ago

If I understood correctly, this adaptive component is a way to expose an interface to swap between node components dynamically.

Correct.

What is the intended interaction with the parameter system for the implemented nodes for the different architectures? Are parameters meant to be bound to the adaptive component (essentially the node manager), or is the intention to have the parameters bound to the architecture specific nodes instead?

The former would make it Component-agnostic, which is what's seeked in here originally. Otherwise, every single new Component would need to implement the parameter logic, which IMHO defeats the purpose of this abstraction. But I'm open for discussion. Would the latter bring more value in your view?

adaptive_component provides by default a composable stateless[^1] container for adaptive ROS 2 Node computations. Essentially, it allows to hot-swap between Components targeting different compute architectures (e.g. FPGA, CPU or GPU) at run-time and without having to stop Nodes. In more elaborated use cases, adaptive_component should allow you to build stateful adaptive ROS 2 Node computations[^2].

[^1]: Example of stateless adaptive ROS 2 Node https://github.com/ros-acceleration/acceleration_examples/blob/main/nodes/doublevadd_publisher/src/doublevadd_component_adaptive.cpp [^2]: Example of stateful adaptive ROS 2 Node https://github.com/ros-acceleration/acceleration_examples/blob/main/nodes/doublevadd_publisher/src/doublevadd_component_adaptive_stateful.cpp

methylDragon commented 2 years ago

I think the former makes sense also, but there will need to be an implementation to allow for the sharing of parameters.

In ROS 2, parameters are associated with each node, which means that some extra code interfaces must be implemented in the adaptive_component and/or child components to allow child nodes to access the parameters in the adaptive component.

One way to do this is via a pattern like the parameter_blackboard (src), or by having the child nodes call the services on the adaptive component at init or param change, or potentially using one of the parameter synchronization clients (here)

No matter how you do it, I think this functionality is quite core to the usage of this package.

vmayoral commented 2 years ago

I think the former makes sense also, but there will need to be an implementation to allow for the sharing of parameters.

In ROS 2, parameters are associated with each node, which means that some extra code interfaces must be implemented in the adaptive_component and/or child components to allow child nodes to access the parameters in the adaptive component.

One way to do this is via a pattern like the parameter_blackboard (src), or by having the child nodes call the services on the adaptive component at init or param change.

No matter how you do it, I think this functionality is quite core to the usage of this package.

Not really (or I just don't follow you). On what basis do you believe that further parameter exchange needs to happen between the Components within an adaptive_component container? Do you have any toy example that demonstrates this limitation? Have you tested things against the toy example that I provided in here? If you have difficulties testing the FPGA piece, simply replace that Component with another CPU one (that slightly changes the behavior for visual inspection) that allows you to debug this out in your workstation.

The current approach worked just fine in my testing. Note that as highlighted above, adaptive_component implements a composable stateless container for adaptive ROS computations. It does so by leveraging a single threaded executor (yes, within the Component), which is then used to hold one of the sub-components at a time. Accordingly, the only parameters you're modifying are the ones from the adaptive_component Component itself, which is the one that takes care of abstracting away the complexity from all sub-components.

Note that the adaptive_component can still be composed with other Components into a Node, as usual, with any of the executors available in ROS 2.

methylDragon commented 2 years ago

Issue Description

Suppose we made an adaptive_component with the following two NodeCPU nodes:

// doublevadd_publisher_adaptive.cpp
...
  auto adaptive_node = std::make_shared<composition::AdaptiveComponent>(
        "doublevadd_publisher_adaptive",        // name of the adaptive Node
        options,                                // Node options
                                                // CPU
        std::make_shared<NodeCPU>("_doublevadd_publisher_adaptive_cpu1", options),
                                                // FPGA
        std::make_shared<NodeCPU>("_doublevadd_publisher_adaptive_cpu2", options),
                                                // GPU
        nullptr);
...

And on init of each component we initialized a random ID and stored it in a parameter

void
DoubleVaddComponent::initialize()
{
  // Create a publisher of "std_mgs/String" messages on the "chatter" topic.
  pub_ = create_publisher<std_msgs::msg::String>("vector", 10);

  this->declare_parameter("example_param");
  this->set_parameter(rclcpp::Parameter("example_param", rand()));

  RCLCPP_INFO(this->get_logger(), "NODE COMPONENT INIT: ID: %ld", this->get_parameter("example_param").as_int());

  // define differential time value (in ms) for the loop
  dt_ = 100ms;  // 10 Hz

  // Use a timer to schedule periodic message publishing.
  timer_ = create_wall_timer(dt_, std::bind(&DoubleVaddComponent::on_timer, this));
}

Then, on the component's timer, if we print the random ID... (output below)

    RCLCPP_INFO(this->get_logger(), "Publishing: '%s' | Param: %d", message.data.c_str(), this->get_parameter("example_param").as_int());

The Problem and Example Output

You'll notice that any set parameters are local to the components. In the output below, I'm swapping between the adaptive nodes.

Reminder: I am only setting the parameter on node component init! They're not set again anywhere else, and the setting always occurs with the printout. Notice how, when we swap components, the init printout doesn't print again, which means that no parameters were set, yet the param value still changes!

(Pay attention to the Param value (which prints the random ID) at the end of each print line.)

```shell [INFO] [1649791963.178611693] [_doublevadd_publisher_adaptive_cpu2]: NODE COMPONENT INIT: ID: 1804289383 [INFO] [1649791963.180852465] [_doublevadd_publisher_adaptive_cpu1]: NODE COMPONENT INIT: ID: 846930886 [INFO] [1649791963.335921033] [_doublevadd_publisher_adaptive_cpu1]: Publishing: 'vadd finished, iteration: 0' | Param: 846930886 [INFO] [1649791963.435811399] [_doublevadd_publisher_adaptive_cpu1]: Publishing: 'vadd finished, iteration: 1' | Param: 846930886 ... ... [INFO] [1649791966.135860027] [_doublevadd_publisher_adaptive_cpu1]: Publishing: 'vadd finished, iteration: 28' | Param: 846930886 [INFO] [1649791966.234400990] [_doublevadd_publisher_adaptive_cpu1]: Publishing: 'vadd finished, iteration: 29' | Param: 846930886 [INFO] [1649791966.281801264] [_doublevadd_publisher_adaptive_cpu2]: Publishing: 'vadd finished, iteration: 0' | Param: 1804289383 [INFO] [1649791966.329138172] [_doublevadd_publisher_adaptive_cpu2]: Publishing: 'vadd finished, iteration: 1' | Param: 1804289383 ... ... [INFO] [1649791969.126573667] [_doublevadd_publisher_adaptive_cpu2]: Publishing: 'vadd finished, iteration: 29' | Param: 1804289383 [INFO] [1649791969.233777152] [_doublevadd_publisher_adaptive_cpu2]: Publishing: 'vadd finished, iteration: 30' | Param: 1804289383 [INFO] [1649791969.282246829] [_doublevadd_publisher_adaptive_cpu1]: Publishing: 'vadd finished, iteration: 30' | Param: 846930886 [INFO] [1649791969.330831294] [_doublevadd_publisher_adaptive_cpu1]: Publishing: 'vadd finished, iteration: 31' | Param: 846930886 ... ... [INFO] [1649791970.136571962] [_doublevadd_publisher_adaptive_cpu1]: Publishing: 'vadd finished, iteration: 39' | Param: 846930886 [INFO] [1649791970.236887458] [_doublevadd_publisher_adaptive_cpu1]: Publishing: 'vadd finished, iteration: 40' | Param: 846930886 ```

Furthermore, those internal components' parameters are not visible in the CLI Edit: This is just because the names are prepended with _, which I think was intentional and not an issue, mb

$ ros2 param list
/doublevadd_publisher_adaptive:
  adaptive
  qos_overrides./parameter_events.publisher.depth
  qos_overrides./parameter_events.publisher.durability
  qos_overrides./parameter_events.publisher.history
  qos_overrides./parameter_events.publisher.reliability
  use_sim_time

Why This Happens

I think the main thing to be clear on is that the AdaptiveComponent and its child components are each nodes in their own right with their own parameter namespaces. While it's true that the child nodes are "subsumed" under the AdaptiveComponent, they do not share their parameter namespaces.

All that AdaptiveComponent is doing is just storing the node pointers and changing which one is used with the executor, but there's no synchronization or mechanism like that that is working with parameters.

-- Aside: Now, what I am not sure about is why the child node components don't show up in ros2 node list or ros2 param list. They by right should be appearing if they're on the executor. And I've confirmed that they can subscribe...

This is a separate issue that might need to be tracked down. If you happen to have insight on this let me know!

Edit: It's just because the node names are prepended with _, so they're hidden. (No issues here, I just didn't notice, oops.)

Why is this potentially an issue?

Now, we have two problems:

  1. Local changes don't propagate up to their adaptive_component container
  2. Even if we could propagate the changes up, there's no implementation to propagate it down again to the component's siblings

Consider the following use case, company X is integrating their navigation stack, and wants to hotswap between FPGA and CPU computation. They explicitly want their navigation parameters to be updatable, and expect the parameters they set to "stick" regardless of what underlying hardware they're running ROS on.

Under the current implementation:

  1. User starts adaptive_component with components A_CPU, B_FPGA with parameters Z with param value z_init (same for A_FPGA, B_CPU on init)
  2. User starts adaptive_component with adaptive 0, running A_CPU
  3. User sets a new parameter value for Z->z_a
  4. User swaps to adaptive: 1, running B_FPGA, expecting the changes made when they were running A_CPU to also reflect in B_FPGA
  5. **!!! ISSUE !!!***: B_FPGA's Z parameter does NOT store z_a!! It is still stuck on z_init!

Proposed Solutions

There are some approaches to take but all require that some synchronization implementation be folded into the use scenario of this package.

  1. Defer it to the user, meaning that all of the node components that the user implements (which should rightfully be treated as nodes that happen to be able to do intraprocess communication) must implement their OWN synchronization mechanism
    • If you want this to be the case, I think it would be a strong recommendation (almost bordering on requirement) to document this appropriately
  2. Implement the synchronization on the AdaptiveComponent
    • One way to do this would be to overload the this->declare_parameter and associated methods to intercept the call and defer it to the overarching AdaptiveComponent. (This would probably require one more class interface to do the interception.)
      • This would mean that the AdaptiveComponent would truly be the one handling all the parameters across all child components. And it should also allow for the parameters to be exposed on the CLI.
      • When I say an extra class interface, what I mean by that is, instead of subclassing Node, users subclass a custom child-class of Node that you will provision that already implements the parameter method interceptions
    • Another way is to use a synchronization client
      • You could store the synchronization client on the AdaptiveComponent, which would achieve the same sort of effect as the first method
    • Or another way is to explicitly pass in the overarching AdaptiveComponent and instruct users to use that in any downstream node components to get and set parameters instead of using this->.
    • Or you can explicitly do it like you've done in the doublevadd_component_adaptive_stateful.cpp example, except as part of this package, since I think it's a core enough functionality...

I'm still undecided as to which is the best approach though..

methylDragon commented 2 years ago

This is also made a little bit more complicated in that inactive nodes that are not running on the executor won't be able to subscribe/listen for service callbacks (by design.)

The difficulty in synced siblings like that make me want to just use something in the parent node to synchronize.

vmayoral commented 2 years ago

Ok, thanks for the rationale above, that helped align visions.

Shortly, I think what you're describing above is a subset of the stateful possibilities (in your example, the state is a ROS parameter you want to pass from Component in CPU1, to Component in CPU2), but adaptive_component is purposely stateless by default[^1] and can be made stateful, if desired (as exemplified in here, refer to the README for more), but we believe that implementing the state should be up to the user (if needed).

Also, kindly note my previous comment in here and the rationale at [^1]:

Note that as highlighted above, adaptive_component implements a composable stateless container for adaptive ROS computations.

Regarding your comment about:

All that AdaptiveComponent is doing is just storing the node pointers and changing which one is used with the executor, but there's no synchronization or mechanism like that that is working with parameters.

Correct, but that's all we want at this point, a container that facilitates this feature and simplifies the process of building adaptive ROS computations. I agree that stateful adaptive Components are needed, and that's why we exemplified it, but out of all our internal discussions we concluded that that state is actually application-specific (or Component-specific, if you're picky). So it's to the user to specify which mechanisms shall be used and what information should be conveyed across Components in an adaptive_component container.

Part of this rationale was discussed at https://discourse.ros.org/t/adaptive-ros-2-node-computations-and-hardware-acceleration-contributions/23197. Let me know if this addresses your concerns above, and also, please send any further feedback that comes to mind.

[^1]: Because simply you can't contemplate all possibilities of what a state will represent in a ROS Component. In some cases, as in our example, it'll be just a protected variable in a subclass of AdaptiveComponent. In others, for example in yours, the state is a ROS parameter which you initialize in the source code of one of the Components.

methylDragon commented 2 years ago

I'm good as long as long as it's properly documented, but this isn't a blocker for release, so all good.

I would recommend exploring or noting the possible approaches I mentioned, since it might be difficult for downstream users to see how to implement stateful parameter syncs.

vmayoral commented 2 years ago

All right, since you've got no more comments and examples of stateful adaptive Components are in the README, I'm closing this up.

I think there's definitely room for improving documentation and examples but I'm hoping that'll come along the road as this gets used.