Reduce execution time for PTF tests

stratum / fabric-tna

The SD-Fabric data plane

https://docs.sd-fabric.org/

30 stars 15 forks source link

Reduce execution time for PTF tests #238

Open ccascone opened 3 years ago

ccascone commented 3 years ago

In #209 we have started to hit the 45m timeout for Jenkins jobs configured in ci-management: https://gerrit.onosproject.org/plugins/gitiles/ci-management/+/refs/heads/master/jjb/templates/fabric-tna-jobs.yaml#46

We could increase the timeout, but this would have an impact on the AWS bill for the Jenkins executors. Instead, we should devise strategies to reduce execution times of PTF tests, not only to reduce the AWS bill but also to make it easier to run them on a laptop without wasting too much time.

We know that the following operations are time-consuming on Tofino Model:

Table entry insert/delete
Packet processing
Counter reads
Timeout for verify_no_packet (applies to all targets)

We could instrument the code with a profiler to figure out exactly where we spend most of the time.

Let's use this issue to collect ideas and agree on a refactoring strategy.

ccascone commented 3 years ago

We could avoid inserting/deleting the same entries over and over for the same test class.

Today most test classes use the following pattern where we @autocleanup after each invocation of doRunTest:

    @autocleanup
    def doRunTest(self, **kwargs):
        # Insert entries, send/verify packet

    def runTest(self):
        for tagged1 in [True, False]:
            for pkt_type in PKT_TYPES_UNDER_TEST:
                for some_other_test_param in test_params:
                    self.doRunTest(tagged1, pkt_type, some_other_test_param)

Instead, we could insert the same entry once and @autocleanup after the whole class execution using something like:

    def doRunTest(self, **kwargs):
        # Insert only new entries (e.g., internally dropping writes for existing entries) 
        # Send/verify packet

    @autocleanup
    def runTest(self):
        for tagged1 in [True, False]:
            for pkt_type in PKT_TYPES_UNDER_TEST:
                for some_other_test_param in test_params:
                    self.doRunTest(tagged1, pkt_type, some_other_test_param)

Yi-Tseng commented 3 years ago

As discussed in #232, we can reduce the iteration number for FabricIPv4UnicastGroupTestAllPort*

The reason we can reduce it is that we only collect the last two results and check. we can remove the loop unless there are other concerns.

        tcpsport_toport = [None, None]
        for i in range(50):
            test_tcp_sport = 1230 + i
            pkt_from1 = testutils.simple_tcp_packet(
                eth_src=HOST1_MAC,
                eth_dst=SWITCH_MAC,
                ip_src=HOST1_IPV4,
                ip_dst=HOST2_IPV4,
                ip_ttl=64,
                tcp_sport=test_tcp_sport,
            )
            exp_pkt_to2 = testutils.simple_tcp_packet(
                eth_src=SWITCH_MAC,
                eth_dst=HOST2_MAC,
                ip_src=HOST1_IPV4,
                ip_dst=HOST2_IPV4,
                ip_ttl=63,
                tcp_sport=test_tcp_sport,
            )
            exp_pkt_to3 = testutils.simple_tcp_packet(
                eth_src=SWITCH_MAC,
                eth_dst=HOST3_MAC,
                ip_src=HOST1_IPV4,
                ip_dst=HOST2_IPV4,
                ip_ttl=63,
                tcp_sport=test_tcp_sport,
            )
            self.send_packet(self.port1, pkt_from1)
            out_port_indx = self.verify_any_packet_any_port(
                [exp_pkt_to2, exp_pkt_to3], [self.port2, self.port3]
            )
            tcpsport_toport[out_port_indx] = test_tcp_sport