Closed Greek64 closed 2 years ago
@eboasson Friendly ping here.
Hi @Greek64 what DDS implementation are you using for the service? It advertises itself as vendor 0.0, which is undefined/unknown (not illegal, as far as I know, but this is definitely not an unpatched Cyclone DDS). It matters, because the DDSI specification is missing a proper handshake between the reader and the writer that one needs for correct behaviour for volatile
data with respect to what data is guaranteed to arrive and under what circumstances a sample-lost event may need to be raised.
Cyclone implements that handshake using standard messages, but for the reader needs to know a bit about the behaviour of the writer. And so it avoids the handshake if the other side is a non-Cyclone DDS implementation. And that can result in exactly this data loss.
Hi @Greek64 what DDS implementation are you using for the service?
It is my own RTPS/DDS implementation, which I have not released publicly yet. It is basically an RTPS/DDS implementation written in VHDL for FPGA boards. That is also why you may see some unconventional stuff like always nacking the next 32 SNs, not depending on what the HEARTBEAT was saying (should theoretical not be illegal as I read the Specs). My end goal is to make some kind of ROS2 VHDL API, and for that I'm testing the interoperability of my implementation with Cyclone DDS, since it is the current default RMW implementation.
It matters, because the DDSI specification is missing a proper handshake between the reader and the writer that one needs for correct behaviour for volatile data with respect to what data is guaranteed to arrive and under what circumstances a sample-lost event may need to be raised.
Ah, yes. For how detailed DDSI-RTPS Specification is written, it is missing a lot. (They even haven't defined what a serialized key payload is, and every vendor somehow winged it).
My implementation handles volatile
as follows:
SEQUENCENUMBER_UNKNOWN
. If a HEARTBEAT message arrives while the next_sn pointer is SEQUENCENUMBER_UNKNOWN
, set next_sn to last_sn of the HEARTBEAT message (i.e. ignore historical data). If on the other hand, a DATA message arrives while the next_sn pointer is SEQUENCENUMBER_UNKNOWN
, accept the DATA message and set next_sn to data_sn+1.Cyclone implements that handshake using standard messages, but for the reader needs to know a bit about the behaviour of the writer.
So, i guess the question comes down to, what is the behaviour that Cyclone DDS expects, so that I can mimic it?
It is my own RTPS/DDS implementation, which I have not released publicly yet. It is basically an RTPS/DDS implementation written in VHDL for FPGA boards.
Nice! We discussed one some time ago, and indeed some friends and I thought of doing something of the sort around 2005, so pre-DDSI, almost pre-DDS. But nothing ever came of that. This is really cool.
every vendor somehow winged it
Not sure about that ... try disposing an instance with RTI. All you get is a DDSI keyhash despite what it says in the spec — that they wrote! — and if the maximum size of the serialized key doesn't fit in 16 bytes, they expect you to make do with an MD5 hash. For an implementation like Cyclone DDS that doesn't organise things by MD5 hash ...
So, i guess the question comes down to, what is the behaviour that Cyclone DDS expects, so that I can mimic it?
Not retransmit any historical data to a volatile reader and get whitelisted (by vendor id, so that's going to be a bit tricky if you stick with 0.0 [1]) by Cyclone so that the reader will send a retransmit request for everything. The whitelisting is because the spec doesn't say what will happen and Cyclone therefore depends on implementation-specific behaviour.
Usually, a volatile writer won't have much historical data around (but it may), and all that happens if you do send historical data is that it shows up in the reader. So you might want to try it out [2].
Also I would suggest to open any new issues that pop up at https://github.com/eclipse-cyclonedds/cyclonedds as long as it doesn't have anything to do with ROS 2 or the RMW layer, that makes the most sense to me and would be (I think) the most polite towards those who monitor ROS 2 related things but not Cyclone DDS related things.
[1] Getting a vendor id is easy, by the way (see https://www.dds-foundation.org/dds-rtps-vendor-and-product-ids/); or you could just pick one and change to an "official" one later. [2] If you want to try it out without worrying about vendor ids see (I think those are the only places where it matters):
Also I would suggest to open any new issues that pop up at https://github.com/eclipse-cyclonedds/cyclonedds as long as it doesn't have anything to do with ROS 2 or the RMW layer, that makes the most sense to me and would be (I think) the most polite towards those who monitor ROS 2 related things but not Cyclone DDS related things.
Yeah, I was unsure if the "issue" was in the RMW or underlying DDS implementation, so I thought of opening the issue here (since I effectively test ROS service interoperability).
Not sure about that ... try disposing an instance with RTI. All you get is a DDSI keyhash despite what it says in the spec
Yeah, isn't it also fast-rtps
that assumes an PID_KEY_HASH
on every instance state (aka PID_STATUS_INFO
)?
Getting a vendor id is easy, by the way (see https://www.dds-foundation.org/dds-rtps-vendor-and-product-ids/);
That was my plan later when I release the implementation in a public github repo.
or you could just pick one and change to an "official" one later.
I was unsure about the "legality" of spoofing other vendor IDs, but I guess for testing purposes I could do that.
Not retransmit any historical data to a volatile reader
Well, from the perspective of my implementation it is not sending historical data, it just so happens that the data arrives before the HEARTBEAT timeout (HEARTBEAT are triggered in a separate fixed timer), and is send before the first HEARTBEAT is sent. So I guess the correct thing to do is to trigger a HEARTBEAT immediately on remote reader match. That should inform the reader of what is historical data.
And since the DATA is not actually "lost", but regarded as historical data by design, this issue has solved itself.
Thanks for the info and help!
@Greek64, I was kindly referred by @eboasson to this thread.
I believe we share common interests and are looking at similar things. I work at Acceleration Robotics and an FPGA-native RTPS implementation is something that we've been considering. Do you think we could chat and exchange some thoughts?
@vmayoral Sure thing.
Sure thing.
Fantastic! Do you think you could drop me an e-mail to victor at accelerationrobotics.com
? We'll take it from there.
I'm trying to communicate with a Cyclone DDS service client, and I seem to have stumbled into a weird behavior, where if a DATA message arrives before the very first HEARTBEAT of the RTPS writer, the DATA message arrival is acknowledged, but the Data itself is not forwarded to the actual application.
I'm using the service tutorial (AddTwoInts.srv) as base.
Looking through the cylonedds.log at "finest" we can see how the Cyclone Request Writer and Response Reader are created
Through SPDP the Request Reader and Response Writer are discovered and "matched" to the local endpoints
Afterwards the local Request Writer sends the cached service request over the wire
And gets a service response
After it gets the first HEARTBEAT from the Response Writer, it acknowledges having received the sample, but the user application never gets the actual data (the client hangs waiting)
If, on the other hand, the Response Writer only send the DATA message in response to ACKNACKs (aka RTPS PUSH_MODE=FALSE) the data is received and returned to the user application as expected
Additional Information
Version Information:
cap2.log
contains the log for the first case (DATA before HEARTBEAT),cap3.log
contains the log for the latter case (DATA after HEARTBEAT/ACKNACK), andPCAP.zip
contains network captures for both cases. cap2.log cap3.log PCAP.zip