p4lang / p4c-bm

Generates the JSON configuration for the behavioral-model (bmv2), as well as the C/C++ PD code
Apache License 2.0
24 stars 29 forks source link

Get the multi_queue length problem in P4 #101

Closed Wannabeperfect closed 6 years ago

Wannabeperfect commented 6 years ago

I recently used P4 to do a load balancing experiment with multiple paths. If there are four equivalent paths, when a packet arrives, I want to get the queue length of each path and find out the path of the shortest queue , and send packets to that shortest queue path.(the topo is shown as below,find the shortest depart queue from s1->s2,s1->s3,s1->s4,s1->s5,and send packet to the shortest link.)

h1------------- --------s2-------- ----- h7
h2------------- --------s3-------- ---- -h8
--------s1--- ------- s6-----
h3------------- --------s4-------- ----- h9
h4------------- --------s5-------- ------- h10

My idea is to define four registers, get the length of each queue and store them in registers, and then get the minimum value by comparing them one by one.

I found that there is a metadata about the queue in p4c.

header queueing_metadata_t { bit<48> enq_timestamp;
bit<24> enq_qdepth;
bit<32> deq_timedelta;
bit<24> deq_qdepth;
} register_write(qlength_reg,queueing_metadata.deq_qdepth)

I want to store the queue length in a register like this, but there are four links so that I don't know how to get the queue length of each link.

Therefore,I have two main questions: 1.Bmv2 is whether to use one queue for all ports or one queue for each port. 2.How to get the queue length of each different link?

Thanks! :)

Wannabeperfect commented 6 years ago

1

The topology is like this, the above format is not correct.

jafingerhut commented 6 years ago

Others can comment with more details and knowledge, but I just wanted to comment so that you are aware that accessing not merely 1 queue length in a P4 program, but multiple queue lengths, might be a very demanding thing that will work for some P4 targets, but not others. It is certainly possible to implement access to such things in a software switch like bmv2 (whether or not it is currently implemented, I do not know, but probably not). It becomes very challenging to implement such access in targets that have much lower $ cost per terabit/second.

In a high performance target, perhaps a realistic implementation would maintain a P4 register whose contents were a few bits per port on the device, e.g. 3 bits per port, where the 3 bits encoded some coarse notion of how congested that port's queue is. For a device with 32 ports, for example, that requires reading 3*32=96 bits per packet being processed, then updating say one of those 3-bit values, and writing the 96 bits back to the register.

Wannabeperfect commented 6 years ago

@jafingerhut Thanks for your reply. Is there a way to implement this method in bmv2? How to obtain the length of each link queue in the bmv2 environment? Thanks:)

antoninbas commented 6 years ago

It is a little strange that you would post your question here, yet provide a code snippet written in P4_16 :). p4c-bm (this repo) is the legacy P4_14 compiler for bmv2 and does not support P4_16.

Regarding your questions:

  1. bmv2 uses a different queue for each egress port by default (if you enable priority queueing, you can have multiple priority queues for each port)
  2. If you want to modify bmv2 simple_switch to write the different queue occupancies to different metadata fields (or even to a register directly) so that egress packets can have access to all of them, you should feel free to do so. It should be a simple modification of the egress_thread method (https://github.com/p4lang/behavioral-model/blob/master/targets/simple_switch/simple_switch.cpp#L462). I imagine it would look something like this:
    for (size_t i = 0; i < max_port; i++) {
    auto qdepth = egress_buffers.size(i);
    // you will need to define fields queueing_metadata.enq_qdepth_{0, 1, 2, ..., max_port} in your program
    phv->get_field("queueing_metadata.enq_qdepth_" + std::to_string(i)).set(qdepth);
    }

    However, as @jafingerhut pointed out, this is not representative of what a high-performance target would let you do, so we would not accept such a patch to simple_switch. Obviously you can do whatever you want with your own copy of the code.

Wannabeperfect commented 6 years ago

@antoninbas thank you so much! I will have a try :)