ip分片重组算法（基于rfc791）

背景介绍：计算机网络是一个多级分层的结构，从上往下依次是应用层（http）-> 运输层（tcp/udp）->网络层（ip）-> 链路层，在链路层上有一个最大传输单元（mtu）的限制，如果IP报文长度大于mtu，那么在路由器上就会执行分片的操作，分片之前会检查报文的DF标识位，判断是否可以进行分片，如果不可以则将直接丢弃分片，否则进行分片操作。路由器只负责执行分片操作，不负责重组操作，被分片的报文，还可以再次进行分片。直到到达目标主机，才会对所有分片进行重组。

IP报文包含报文头（Header）和报文体（data）两部分，在每个分片中都会把头部添加上，假设我们的报文长度是4000，报文头20个字节，报文体3800个字节，mtu为1500个字节。那么每个分片最大能够容纳的报文体长度为1500-20=1480个字节。因此需要（3800/1480 + 1）= 3个分片，分片的过程中，除了最后一个分片外，其他所有的分片都会达到mtu满载。3个分片的大小分别为1500字节，1500字节，860个字节。

目标主机怎样把属于同一个报文的分片重组成一个报文呢？这得益于IP报文头的Identifier, flag和offset三个字段。目的主机会把具有相同标识（Identifier），相同源ip和目的ip的报文进行重组，每个分片的报文头都有一个offset字段，说明它属于原始报文的偏移，还有一个MF标识位，说明它是否为最后一个分片。

分片时，除了最后一个分片外，其它所有分片的MF都会被置为1，最后一个分片的MF字段和被分片的报文的MF字段保持一致。这样做的原因是，报文在传输的过程中是可能会被多次分片的，当前分片的最后一个报文并不一定是发送端发送的原始报文的最后一个分片。同理，分片的offset字段也是在被分片报文的offset字段上进行偏移获取的。

不同分片在网络中的转发路径不一定相同，因此它们到达目标主机不一定是按照原始顺序。目标主机需要维护一个buffer存放报文体，当分片到达时，提取出分片的报文头，找到该分片对应于buffer中的位置，提取出报文体放入buffer的对应位置。当MF=0的报文到达时，我们知道它就是最后一个报文，可以通过此报文的offset判断出原始报文的总长度。目标主机还会维护一个bit表，用来维护每个分片到达的情况，到达的分片对应的bit位将会被置为1，当目标主机发现所有分片已到达时，将会把buffer中存储的数据交给上层应用。如果超过规定的时间，还有分片没有到达，目标主机会丢弃该报文的所有分片。

下面是rfc791中的原始描述，可以深读一下，对于编程还是有很大帮助的

Fragmentation and Reassembly.

    The internet identification field (ID) is used together with the
    source and destination address, and the protocol fields, to identify
    datagram fragments for reassembly.

    The More Fragments flag bit (MF) is set if the datagram is not the
    last fragment.  The Fragment Offset field identifies the fragment
    location, relative to the beginning of the original unfragmented
    datagram.  Fragments are counted in units of 8 octets.  The

[Page 24]

September 1981
                                                       Internet Protocol
                                                           Specification

    fragmentation strategy is designed so than an unfragmented datagram
    has all zero fragmentation information (MF = 0, fragment offset =
    0).  If an internet datagram is fragmented, its data portion must be
    broken on 8 octet boundaries.

    This format allows 2**13 = 8192 fragments of 8 octets each for a
    total of 65,536 octets.  Note that this is consistent with the the
    datagram total length field (of course, the header is counted in the
    total length and not in the fragments).

    When fragmentation occurs, some options are copied, but others
    remain with the first fragment only.

    Every internet module must be able to forward a datagram of 68
    octets without further fragmentation.  This is because an internet
    header may be up to 60 octets, and the minimum fragment is 8 octets.

    Every internet destination must be able to receive a datagram of 576
    octets either in one piece or in fragments to be reassembled.

    The fields which may be affected by fragmentation include:

      (1) options field
      (2) more fragments flag
      (3) fragment offset
      (4) internet header length field
      (5) total length field
      (6) header checksum

    If the Don't Fragment flag (DF) bit is set, then internet
    fragmentation of this datagram is NOT permitted, although it may be
    discarded.  This can be used to prohibit fragmentation in cases
    where the receiving host does not have sufficient resources to
    reassemble internet fragments.

    One example of use of the Don't Fragment feature is to down line
    load a small host.  A small host could have a boot strap program
    that accepts a datagram stores it in memory and then executes it.

    The fragmentation and reassembly procedures are most easily
    described by examples.  The following procedures are example
    implementations.

    General notation in the following pseudo programs: "=<" means "less
    than or equal", "#" means "not equal", "=" means "equal", "<-" means
    "is set to".  Also, "x to y" includes x and excludes y; for example,
    "4 to 7" would include 4, 5, and 6 (but not 7).

                                                               [Page 25]

                                                          September 1981
Internet Protocol
Specification

    An Example Fragmentation Procedure

      The maximum sized datagram that can be transmitted through the
      next network is called the maximum transmission unit (MTU).

      If the total length is less than or equal the maximum transmission
      unit then submit this datagram to the next step in datagram
      processing; otherwise cut the datagram into two fragments, the
      first fragment being the maximum size, and the second fragment
      being the rest of the datagram.  The first fragment is submitted
      to the next step in datagram processing, while the second fragment
      is submitted to this procedure in case it is still too large.

      Notation:

        FO    -  Fragment Offset
        IHL   -  Internet Header Length
        DF    -  Don't Fragment flag
        MF    -  More Fragments flag
        TL    -  Total Length
        OFO   -  Old Fragment Offset
        OIHL  -  Old Internet Header Length
        OMF   -  Old More Fragments flag
        OTL   -  Old Total Length
        NFB   -  Number of Fragment Blocks
        MTU   -  Maximum Transmission Unit

      Procedure:

        IF TL =< MTU THEN Submit this datagram to the next step
             in datagram processing ELSE IF DF = 1 THEN discard the
        datagram ELSE
        To produce the first fragment:
        (1)  Copy the original internet header;
        (2)  OIHL <- IHL; OTL <- TL; OFO <- FO; OMF <- MF;
        (3)  NFB <- (MTU-IHL*4)/8;
        (4)  Attach the first NFB*8 data octets;
        (5)  Correct the header:
             MF <- 1;  TL <- (IHL*4)+(NFB*8);
             Recompute Checksum;
        (6)  Submit this fragment to the next step in
             datagram processing;
        To produce the second fragment:
        (7)  Selectively copy the internet header (some options
             are not copied, see option definitions);
        (8)  Append the remaining data;
        (9)  Correct the header:
             IHL <- (((OIHL*4)-(length of options not copied))+3)/4;

[Page 26]

September 1981
                                                       Internet Protocol
                                                           Specification

             TL <- OTL - NFB*8 - (OIHL-IHL)*4);
             FO <- OFO + NFB;  MF <- OMF;  Recompute Checksum;
        (10) Submit this fragment to the fragmentation test; DONE.

      In the above procedure each fragment (except the last) was made
      the maximum allowable size.  An alternative might produce less
      than the maximum size datagrams.  For example, one could implement
      a fragmentation procedure that repeatly divided large datagrams in
      half until the resulting fragments were less than the maximum
      transmission unit size.

    An Example Reassembly Procedure

      For each datagram the buffer identifier is computed as the
      concatenation of the source, destination, protocol, and
      identification fields.  If this is a whole datagram (that is both
      the fragment offset and the more fragments  fields are zero), then
      any reassembly resources associated with this buffer identifier
      are released and the datagram is forwarded to the next step in
      datagram processing.

      If no other fragment with this buffer identifier is on hand then
      reassembly resources are allocated.  The reassembly resources
      consist of a data buffer, a header buffer, a fragment block bit
      table, a total data length field, and a timer.  The data from the
      fragment is placed in the data buffer according to its fragment
      offset and length, and bits are set in the fragment block bit
      table corresponding to the fragment blocks received.

      If this is the first fragment (that is the fragment offset is
      zero)  this header is placed in the header buffer.  If this is the
      last fragment ( that is the more fragments field is zero) the
      total data length is computed.  If this fragment completes the
      datagram (tested by checking the bits set in the fragment block
      table), then the datagram is sent to the next step in datagram
      processing; otherwise the timer is set to the maximum of the
      current timer value and the value of the time to live field from
      this fragment; and the reassembly routine gives up control.

      If the timer runs out, the all reassembly resources for this
      buffer identifier are released.  The initial setting of the timer
      is a lower bound on the reassembly waiting time.  This is because
      the waiting time will be increased if the Time to Live in the
      arriving fragment is greater than the current timer value but will
      not be decreased if it is less.  The maximum this timer value
      could reach is the maximum time to live (approximately 4.25
      minutes).  The current recommendation for the initial timer
      setting is 15 seconds.  This may be changed as experience with

                                                               [Page 27]

                                                          September 1981
Internet Protocol
Specification

      this protocol accumulates.  Note that the choice of this parameter
      value is related to the buffer capacity available and the data
      rate of the transmission medium; that is, data rate times timer
      value equals buffer size (e.g., 10Kb/s X 15s = 150Kb).

      Notation:

        FO    -  Fragment Offset
        IHL   -  Internet Header Length
        MF    -  More Fragments flag
        TTL   -  Time To Live
        NFB   -  Number of Fragment Blocks
        TL    -  Total Length
        TDL   -  Total Data Length
        BUFID -  Buffer Identifier
        RCVBT -  Fragment Received Bit Table
        TLB   -  Timer Lower Bound

      Procedure:

        (1)  BUFID <- source|destination|protocol|identification;
        (2)  IF FO = 0 AND MF = 0
        (3)     THEN IF buffer with BUFID is allocated
        (4)             THEN flush all reassembly for this BUFID;
        (5)          Submit datagram to next step; DONE.
        (6)     ELSE IF no buffer with BUFID is allocated
        (7)             THEN allocate reassembly resources
                             with BUFID;
                             TIMER <- TLB; TDL <- 0;
        (8)          put data from fragment into data buffer with
                     BUFID from octet FO*8 to
                                         octet (TL-(IHL*4))+FO*8;
        (9)          set RCVBT bits from FO
                                        to FO+((TL-(IHL*4)+7)/8);
        (10)         IF MF = 0 THEN TDL <- TL-(IHL*4)+(FO*8)
        (11)         IF FO = 0 THEN put header in header buffer
        (12)         IF TDL # 0
        (13)          AND all RCVBT bits from 0
                                             to (TDL+7)/8 are set
        (14)            THEN TL <- TDL+(IHL*4)
        (15)                 Submit datagram to next step;
        (16)                 free all reassembly resources
                             for this BUFID; DONE.
        (17)         TIMER <- MAX(TIMER,TTL);
        (18)         give up until next fragment or timer expires;
        (19) timer expires: flush all reassembly with this BUFID; DONE.

      In the case that two or more fragments contain the same data

[Page 28]

September 1981
                                                       Internet Protocol
                                                           Specification

      either identically or through a partial overlap, this procedure
      will use the more recently arrived copy in the data buffer and
      datagram delivered.

zshuangyan / blog

ip分片重组算法（基于rfc791） #23