Closed jafingerhut closed 7 years ago
Also, I am guessing that it is not practical to try to implement a checksum other than one like the IPv4 header checksum using the Checksum extern? That one is commutative and associative in how it is calculated, over 16-bit "words". Things like CRCs are not commutative, so implementing update() of say 20 bytes of data, then trying to remove() on some 4-byte piece in the middle, then update() to add back in a different 4-byte piece, would not be possible unless you knew exactly where that 4-byte piece occurred in the middle of the 20-byte original. Even if you augmented the remove() and update() methods to indicate such a position in the data, I'm not sure it is practical to implement such changes to a CRC with such an API.
I agree with @jafingerhut, having explicit incremental checksum updates is a really nice idea but they are only applicable to checksum algorithms. The penalty for recomputing UDP/TCP checksums is so high, that having a specific incremental checksum extern is worth while.
In P4-14 days it was a lot easier to detect when an incremental checksum update could be done and it was feasible to have the backend handle the updates. I don't think this that easy in P4-16.
I expect this will be a difficult design decision. Some architectures may support incremental checksum computation, while other may not. Perhaps the right choice is to offer both and to require that at least one of them is supported.
#if INCREMENTAL
extern IncrementalUnit { ... }
#else
extern ChecksumUnit { .. }
#endif
BTW: there is nothing in P4-16 about checksum units; you can design a checksum unit which looks very much like the P4-14 version; the actual API is up to the architecture.
This amounts to forking the architecture into PSA-incremental and PSA-nonincremental. If we're serious about issues like portability, compliance, etc. In my opinion, we should strive to do this only if we are really unable to resolve the issue.
On Wed, Aug 16, 2017 at 11:58 AM, Mihai Budiu notifications@github.com wrote:
I expect this will be a difficult design decision. Some architectures may support incremental checksum computation, while other may not. Perhaps the right choice is to offer both and to require that at least one of them is supported.
if INCREMENTAL
extern IncrementalUnit { ... }
else
extern ChecksumUnit { .. }
endif
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/p4lang/p4-spec/issues/385#issuecomment-322818076, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwi0rAEnY39itcO8Yba5pUjiK4LX1fQks5sYxGIgaJpZM4O44G_ .
If your programmable platform has only support for the "wrong" kind of checksum unit, then it won't ever be able to implement PSA.
I understand.
Something to keep in mind about whatever checksum extern we end up with in PSA:
It would be nice if the checksum extern API made it reasoanble to implement 'fixing up' a TCP header checksum for packets experiencing changes to their header fields due to NAT. A reasonable common way to do this, I believe, is to assume that the TCP header checksum is correct, and 'subtract out' the effects of the old TCP pseudo-header (containing IP source and destination addresses), and then 'add in' the effects of the new TCP pseudo-header. The remove() extern method in the latest PSA draft was created with this use case in mind, I would bet.
It seems like there are three main issues related to checksums.
The current PSA source in master includes the following declarations:
control ComputeChecksum<H, M>(inout H hdr, inout M user_meta);
package PSA_Switch<IH, IM, EH, EM>(IngressParser<IH, IM> ip,
Ingress<IH, IM> ig,
ComputeChecksum<IH, IM> ic,
Deparser<IH> id,
EgressParser<EH, EM> ep,
Egress<EH, EM> eg,
ComputeChecksum<EH, EM> ec,
Deparser<EH> ed);
Note that in issue #360, we already decided to eliminate the first ComputeChecksum
block and perform all checksum verification in the parse. As part of that chance, we will also extend the ParserError_t
type to include data for indicating that checksum verification failed.
However, it's natural to wonder whether the ComputeChecksum
controls should be retained or if that functionality should be folded into the Ingress
and Egress
controls. An argument for eliminating it is that the programmer can just use the checksum extern to compute the updated checksum in the Egress
block. An argument against eliminating it is that some targets may only provide the checksum extern at the location in the pipeline just before the deparser.
The V1Model architecture recently proposed to adopt a new extern function (here specialized to 16-bits):
/**
Verifies the 16-bit checksum of the supplied data.
T must be a list expression where all the fields are bit-fields or varbits
and where the total dynamic length of the fields is a multiple of 16 bits.
If this method detects that a checksum of the data is not correct it
sets an internal error flag.
@param condition If 'false' the verification always succeeds.
@param data Data whose checksum is verified.
@param checksum Expected checksum of the data.
*/
extern void verify_checksum<T>(in bool condition, in T data, in bit<16> checksum);
/**
Computes the 16-bit checksum of the supplied data.
@param condition If 'false' the checksum is not changed
@param data Data whose checksum is computed.
@param checksum Checksum of the data.
T must be a list expression where all the fields are bit-fields or varbits
and where the total dynamic length of the fields is a multiple of 16 bits.
*/
extern void update_checksum<T>(in bool condition, in T data, inout bit<16> checksum);
There are a few issues with this definition we might consider in defining the PSA extern:
verify_checksum
function have an effect on global state (the "internal error flag" mentioned in the comment) or should it return a boolean value instead?if
-then
-else
, select
, and switch
), should we introduce yet another primitive that includes conditional functionality? Alternatively, if the main use of condition
is to check header validity, would it be simpler to define the behavior on invalid headers as a no-op?As a strawman, suppose that PSA provided the following function:
extern O checksum<O,T>(in T data, HashAlgorithm hash);
Rationale:
condition
argument is not so useful -- as far as I know the most common use of this is to check for header validity, but in the parser any headers extracted will already be guaranteed to be valid.verify
function provides a mechanism for error-handling in the parser. Can we use this?For example, I could imagine writing the following code in the parser:
state parse_ipv4 {
extract(hdrs.ipv4);
verify(hdr.ipv4.checksum == checksum({ hdr.ipv4.src, ... }, HashAlgorithm.crc16 ), error.ChecksumError);
transition ...
}
Note that if I didn't want to bail out of the parser, instead of using verify
I could have instead OR
'd the result of the boolean test with the appropriate bit in ParserError_t
metadata.
In the same vein, I could imagine writing the following code in the egress control (or update checksum control if we keep that):
if (hdr.ipv4.isValid())
hdr.ipv4.checksum = checksum({hdr.ipv4.src, ...}, HashAlgorithm.crc16);
I realize this proposal slightly complicates compilation, but I would argue not in a particularly harmful way. A compiler for the V1Model functions already has to keep track of boolean conditions and references to state in parsed headers and metadata. It should be straightforward to rewrite code that uses (only) verify
, conditionals, and the checksum
extern into the V1Model functions and vice versa.
Here's one proposal for a extern object:
extern checksum<O> {
checksum(HashAlgorithm hash);
void clear();
void add<T>(in T data);
void remove<T>(in T data);
O get();
bool verify(); // equivalent to get() == 0
}
Comments? @jafingerhut @cc10512 @mbudiu-vmw @hanw @antoninbas
In the latest PSA draft, there are two ComputeChecksum blocks not because one is after parser and other before deparser, but because one is after ingress, and before ingress deparser (currently a separate control block in the PSA draft), and the other is after egress, and before egress deparser. There is currently no 'verifyChecksum' control block in the latest PSA draft, in favor of performing checksum verification of received packets in the parser.
From some of the discussion on this p4lang/p4c issue https://github.com/p4lang/p4c/issues/775 I had the impression that verify_checksum() and update_checksum() were introduced primarily to ease automatic conversion from P4_14 to P4_16 programs.
If that is so, perhaps those extern functions could be preferred for use by that automatic conversion process, but the checksum extern object at the end of @jnfoster's comment above [1], with both add() and remove() methods, might be preferable for use when writing P4_16 PSA programs by hand?
[1] https://github.com/p4lang/p4-spec/issues/385#issuecomment-327684038
In the latest PSA draft, there are two ComputeChecksum blocks not because one is after parser and other before deparser, but because one is after ingress, and before ingress deparser (currently a separate control block in the PSA draft), and the other is after egress, and before egress deparser. There is currently no 'verifyChecksum' control block in the latest PSA draft, in favor of performing checksum verification of received packets in the parser.
That's my fault. I think I missed the meeting where that was discussed in detail and in my conversations this week, there was some confusion. I've updated my comment to reflect that.
From some of the discussion on this p4lang/p4c issue p4lang/p4c#775 I had the impression that verify_checksum() and update_checksum() were introduced primarily to ease automatic conversion from P4_14 to P4_16 programs.
That's right.
If that is so, perhaps those extern functions could be preferred for use by that automatic conversion process, but the checksum extern object at the end of @jnfoster's comment above [1], with both add() and remove() methods, might be preferable for use when writing P4_16 PSA programs by hand?
It's fine with me. However, I'd be curious to hear whether we want to require this functionality -- e.g., are there targets that might provide an extern function for checksumming but not an incremental unit that could implement the object?
@jafingerhut is right: the changes about *_checksum functions which were made recently were entirely confined to v1model and were made to more accurately reflect restrictions of the BMv2 simple_switch implementation of these functions. These restrictions are directly inherited from the P4-14 spec, so we cannot work around them.
If your target is very restricted in what it can do, it may be actually very difficult to write compiler transformations to massage all feasible programs into the restricted forms allowed by the target. I expect that the checksum unit will be a perfect example. You should consider what is needed to do this for an exotic target such as Tofino.
The discussion is whether PSA has to support incremental checksums. I believe it should. I don't know exactly what should happen when someone implements support for PSA on a target device which actually does not have incremental checksums.
As an addendum, without thinking too carefully about resource implications for any particular target, I think I would be in favor of a solution that eliminated the ComputeChecksum
block.
Suppose that we are on a target where all checksumming must happen after the ingress/egress pipelines, but the programmer has written a program that computes a checksum in the middle of one of the ingress/egress pipelines. It follows that the compiler will either have to implement the checksum computation using ALUs or other resources -- which might be expensive but should be feasible -- or it will have to relocate the checksum computation to the end of the block.
If the new checksum is assigned into some header or metadata, but is not actually read in the rest ingress/egress pipeline (this should be the common case), then I claim the compiler transformation to relocate the computation to the end of the pipeline is relatively simple. All the compiler has to do is store the values of the headers that were read to compute the checksum in some metadata. And as an optimization, any fields used in the checksum that are read but not modified later in the pipeline would not need to be stored -- in the limit, if none of the fields are modified, the entire checksum computation can be relocated without any extra state.
If the new checksum is read in the rest of the ingress/egress pipeline (which should be an uncommon case) then the compiler has to somehow implement the checksum using other hash externs or ALU operations. That might be expensive, and even prohibitive on certain targets, but it would be okay in my view for the compiler to reject programs that require too many resources.
This story is relatively simple for a batch-mode checksum function. It's more complicated for an object, which has state that might persist across calls. But, waving my hands slightly, it should be possible for the compiler to extract that state from the call site and reconstruct it later to emulate the semantics of the original program.
@mbudiu-vmw
@jafingerhut is right: the changes about *_checksum functions which were made recently were entirely confined to v1model and were made to more accurately reflect restrictions of the BMv2 simple_switch implementation of these functions. These restrictions are directly inherited from the P4-14 spec, so we cannot work around them.
Right. Note that the vagaries of the P4_14 conversion need to have any effect on PSA. We are starting from a clean slate and can design an elegant solution.
If your target is very restricted in what it can do, it may be actually very difficult to write compiler transformations to massage all feasible programs into the restricted forms allowed by the target. I expect that the checksum unit will be a perfect example. You should consider what is needed to do this for an exotic target such as Tofino.
I think that for the common case, the transformation will be not too difficult to implement and will lead to the same efficient implementations as one would write with a ComputeChecksum
block even on a target like Barefoot's Tofino.
I believe that the parts of this issue related to the Internet checksum for IPv4, TCP, and UDP headers have been fully addressed by the addition of the InternetChecksum extern to the PSA, which supports incremental checksums and 'calculated from scratch' checksums, at least for headers (by design, it does not support performing checkums over packet payloads).
Unless there is some other aspect of this issue that remains, I think it can be closed now.
Indeed, we have considered most aspects.
The current draft of the Checksum extern has clear(), update(), remove(), and get() methods proposed.
Some externs explicitly maintain state across processing of different packets. There doesn't appear to be any desirable reason why the Checksum extern should. Is it considered that every time a new packet begins processing, this extern has automatically had clear() called on it already? Or is it required that for each new packet you first call clear(), because if you do not, it might have internal checksum state left over from a previous packet?
If we restrict our attention to the 16-bit one's complement sum used for IPv4 header checksums for a moment, the primary, and maybe only (?), use case for this extern, a few more questions:
Is it considered an error to call update() or remove() with a collection of data that is anything other than a multiple of 16 bits?
Is it expected that if a list of fields is given, that their order is significant, because the extern effectively takes the list of fields, concatenates them, breaks it up into consecutive multiples of 16 bits long, and does the one's complement sum on these 16-bit pieces?
If all of that is true, then an example showing how to correctly use this to first calculate an IPv4 header checksum, then modify only 1 byte, e.g. the TTL, then use remove() followed by update() to recalculate the new checksum with minimal work, would be good, including pointing out that the remove() and update() must both contain the "2-byte aligned" group of 16 bits in the header that contains the TTL field.