omg-dds / dds-rtps

Validation of interoperability of products compliant with OMG DDS-RTPS standard.
https://www.omg.org/spec/DDSI-RTPS/
Other
11 stars 17 forks source link

FastDDS -> OpenDDS ownership failures #35

Open jrw972 opened 1 month ago

jrw972 commented 1 month ago

Executable's name

Reproducing the problem

What is the problem?

Suggestions about why this problem exists

The output of the subscriber consisted of 87 BLUE samples followed by 55 RED samples and a truncated red sample. Since the test driver is only looking for max_samples_received = 125, it would only process 38 RED samples. Since, there is no interleaving, the test fails the subscriber with RECEIVING_FROM_ONE.

Packet capture revealed the following about the conversation of the RED publisher and the subscriber. The notation [X] indicates packet X in the attached capture. capture.pcap.gz

  1. OpenDDS sends subscription for Square reader [82].
  2. FastDDS sends directed heartbeat. Last is 15, first is 16. [84]
  3. FastDDS sends samples 16 through 30.
  4. FastDDS sends publication for Square writer [132].
  5. OpenDDS sends preassociation acknack [133].
  6. FastDDS sends samples 31 through 43.
  7. OpenDDS sends preassociation acknack [160].
  8. FastDDS sends samples 44 through 71.
  9. OpenDDS sends preassociation acknack [221].
  10. FastDDS sends samples 72 through 98.
  11. FastDDS sends directed heartbeat. Last is 98, first is 16. [286].
  12. FastDDS sends sample 99.
  13. OpenDDS sends acknack requesting samples 16-30 [289].
  14. FastDDS sends samples 16-30 [291].
  15. FastDDS sends samples 100 through 116.

Logically, the heartbeat in [84] should be deferred to at least the publication announcement in [132]. Moreover, it appears that FastDDS is ignoring the non-final preassociation acknacks [133, 160, 221]. Essentially, the delay in discovery combined with the "late" heartbeat causes the samples to be queued so that samples 16 through 99 are probably delivered at once. This explains the output of the subscriber program.

The delay in discovery seems to be a similar problem. OpenDDS sends a preassociation acknack for the publication writer in [64], [96], [108], and [121]. It receives a heartbeat in [122] and [124]. It then requests the publication in [128] which is sent in [132]. The publication is acknowledged in [192] after a heartbeat in [189] and [190].

Other comments

MiguelCompany commented 1 month ago

@jrw972 Thank you for opening the issue and for the deep analysis.

I've checked the code, and it seems the issue is that Fast DDS is ignoring acknacks with count == 0

We'll change it to something like the following:

    bool check_and_set_acknack_count(
            uint32_t acknack_count)
    {
        if (acknack_count >= next_expected_acknack_count_)
        {
            next_expected_acknack_count_ = acknack_count;
            ++next_expected_acknack_count_;
            return true;
        }

        return false;
    }

In the mean time, is there some setting you can configure in OpenDDS to make the count start in 1?

MiguelCompany commented 1 month ago

@jrw972 I opened https://github.com/eProsima/Fast-DDS/pull/4639 which should fix this, will upload a new binary after merging it.

jrw972 commented 1 month ago

In the mean time, is there some setting you can configure in OpenDDS to make the count start in 1?

Unfortunately, that would require a code change.