Fragmentation and Reassembly.
The internet identification field (ID) is used together with the
source and destination address, and the protocol fields, to identify
datagram fragments for reassembly.
The More Fragments flag bit (MF) is set if the datagram is not the
last fragment. The Fragment Offset field identifies the fragment
location, relative to the beginning of the original unfragmented
datagram. Fragments are counted in units of 8 octets. The
[Page 24]
September 1981
Internet Protocol
Specification
fragmentation strategy is designed so than an unfragmented datagram
has all zero fragmentation information (MF = 0, fragment offset =
0). If an internet datagram is fragmented, its data portion must be
broken on 8 octet boundaries.
This format allows 2**13 = 8192 fragments of 8 octets each for a
total of 65,536 octets. Note that this is consistent with the the
datagram total length field (of course, the header is counted in the
total length and not in the fragments).
When fragmentation occurs, some options are copied, but others
remain with the first fragment only.
Every internet module must be able to forward a datagram of 68
octets without further fragmentation. This is because an internet
header may be up to 60 octets, and the minimum fragment is 8 octets.
Every internet destination must be able to receive a datagram of 576
octets either in one piece or in fragments to be reassembled.
The fields which may be affected by fragmentation include:
(1) options field
(2) more fragments flag
(3) fragment offset
(4) internet header length field
(5) total length field
(6) header checksum
If the Don't Fragment flag (DF) bit is set, then internet
fragmentation of this datagram is NOT permitted, although it may be
discarded. This can be used to prohibit fragmentation in cases
where the receiving host does not have sufficient resources to
reassemble internet fragments.
One example of use of the Don't Fragment feature is to down line
load a small host. A small host could have a boot strap program
that accepts a datagram stores it in memory and then executes it.
The fragmentation and reassembly procedures are most easily
described by examples. The following procedures are example
implementations.
General notation in the following pseudo programs: "=<" means "less
than or equal", "#" means "not equal", "=" means "equal", "<-" means
"is set to". Also, "x to y" includes x and excludes y; for example,
"4 to 7" would include 4, 5, and 6 (but not 7).
[Page 25]
September 1981
Internet Protocol
Specification
An Example Fragmentation Procedure
The maximum sized datagram that can be transmitted through the
next network is called the maximum transmission unit (MTU).
If the total length is less than or equal the maximum transmission
unit then submit this datagram to the next step in datagram
processing; otherwise cut the datagram into two fragments, the
first fragment being the maximum size, and the second fragment
being the rest of the datagram. The first fragment is submitted
to the next step in datagram processing, while the second fragment
is submitted to this procedure in case it is still too large.
Notation:
FO - Fragment Offset
IHL - Internet Header Length
DF - Don't Fragment flag
MF - More Fragments flag
TL - Total Length
OFO - Old Fragment Offset
OIHL - Old Internet Header Length
OMF - Old More Fragments flag
OTL - Old Total Length
NFB - Number of Fragment Blocks
MTU - Maximum Transmission Unit
Procedure:
IF TL =< MTU THEN Submit this datagram to the next step
in datagram processing ELSE IF DF = 1 THEN discard the
datagram ELSE
To produce the first fragment:
(1) Copy the original internet header;
(2) OIHL <- IHL; OTL <- TL; OFO <- FO; OMF <- MF;
(3) NFB <- (MTU-IHL*4)/8;
(4) Attach the first NFB*8 data octets;
(5) Correct the header:
MF <- 1; TL <- (IHL*4)+(NFB*8);
Recompute Checksum;
(6) Submit this fragment to the next step in
datagram processing;
To produce the second fragment:
(7) Selectively copy the internet header (some options
are not copied, see option definitions);
(8) Append the remaining data;
(9) Correct the header:
IHL <- (((OIHL*4)-(length of options not copied))+3)/4;
[Page 26]
September 1981
Internet Protocol
Specification
TL <- OTL - NFB*8 - (OIHL-IHL)*4);
FO <- OFO + NFB; MF <- OMF; Recompute Checksum;
(10) Submit this fragment to the fragmentation test; DONE.
In the above procedure each fragment (except the last) was made
the maximum allowable size. An alternative might produce less
than the maximum size datagrams. For example, one could implement
a fragmentation procedure that repeatly divided large datagrams in
half until the resulting fragments were less than the maximum
transmission unit size.
An Example Reassembly Procedure
For each datagram the buffer identifier is computed as the
concatenation of the source, destination, protocol, and
identification fields. If this is a whole datagram (that is both
the fragment offset and the more fragments fields are zero), then
any reassembly resources associated with this buffer identifier
are released and the datagram is forwarded to the next step in
datagram processing.
If no other fragment with this buffer identifier is on hand then
reassembly resources are allocated. The reassembly resources
consist of a data buffer, a header buffer, a fragment block bit
table, a total data length field, and a timer. The data from the
fragment is placed in the data buffer according to its fragment
offset and length, and bits are set in the fragment block bit
table corresponding to the fragment blocks received.
If this is the first fragment (that is the fragment offset is
zero) this header is placed in the header buffer. If this is the
last fragment ( that is the more fragments field is zero) the
total data length is computed. If this fragment completes the
datagram (tested by checking the bits set in the fragment block
table), then the datagram is sent to the next step in datagram
processing; otherwise the timer is set to the maximum of the
current timer value and the value of the time to live field from
this fragment; and the reassembly routine gives up control.
If the timer runs out, the all reassembly resources for this
buffer identifier are released. The initial setting of the timer
is a lower bound on the reassembly waiting time. This is because
the waiting time will be increased if the Time to Live in the
arriving fragment is greater than the current timer value but will
not be decreased if it is less. The maximum this timer value
could reach is the maximum time to live (approximately 4.25
minutes). The current recommendation for the initial timer
setting is 15 seconds. This may be changed as experience with
[Page 27]
September 1981
Internet Protocol
Specification
this protocol accumulates. Note that the choice of this parameter
value is related to the buffer capacity available and the data
rate of the transmission medium; that is, data rate times timer
value equals buffer size (e.g., 10Kb/s X 15s = 150Kb).
Notation:
FO - Fragment Offset
IHL - Internet Header Length
MF - More Fragments flag
TTL - Time To Live
NFB - Number of Fragment Blocks
TL - Total Length
TDL - Total Data Length
BUFID - Buffer Identifier
RCVBT - Fragment Received Bit Table
TLB - Timer Lower Bound
Procedure:
(1) BUFID <- source|destination|protocol|identification;
(2) IF FO = 0 AND MF = 0
(3) THEN IF buffer with BUFID is allocated
(4) THEN flush all reassembly for this BUFID;
(5) Submit datagram to next step; DONE.
(6) ELSE IF no buffer with BUFID is allocated
(7) THEN allocate reassembly resources
with BUFID;
TIMER <- TLB; TDL <- 0;
(8) put data from fragment into data buffer with
BUFID from octet FO*8 to
octet (TL-(IHL*4))+FO*8;
(9) set RCVBT bits from FO
to FO+((TL-(IHL*4)+7)/8);
(10) IF MF = 0 THEN TDL <- TL-(IHL*4)+(FO*8)
(11) IF FO = 0 THEN put header in header buffer
(12) IF TDL # 0
(13) AND all RCVBT bits from 0
to (TDL+7)/8 are set
(14) THEN TL <- TDL+(IHL*4)
(15) Submit datagram to next step;
(16) free all reassembly resources
for this BUFID; DONE.
(17) TIMER <- MAX(TIMER,TTL);
(18) give up until next fragment or timer expires;
(19) timer expires: flush all reassembly with this BUFID; DONE.
In the case that two or more fragments contain the same data
[Page 28]
September 1981
Internet Protocol
Specification
either identically or through a partial overlap, this procedure
will use the more recently arrived copy in the data buffer and
datagram delivered.
背景介绍:计算机网络是一个多级分层的结构,从上往下依次是应用层(http)-> 运输层(tcp/udp)->网络层(ip)-> 链路层,在链路层上有一个最大传输单元(mtu)的限制,如果IP报文长度大于mtu,那么在路由器上就会执行分片的操作,分片之前会检查报文的DF标识位,判断是否可以进行分片,如果不可以则将直接丢弃分片,否则进行分片操作。路由器只负责执行分片操作,不负责重组操作,被分片的报文,还可以再次进行分片。直到到达目标主机,才会对所有分片进行重组。
IP报文包含报文头(Header)和报文体(data)两部分,在每个分片中都会把头部添加上,假设我们的报文长度是4000,报文头20个字节,报文体3800个字节,mtu为1500个字节。那么每个分片最大能够容纳的报文体长度为1500-20=1480个字节。因此需要(3800/1480 + 1)= 3个分片,分片的过程中,除了最后一个分片外,其他所有的分片都会达到mtu满载。3个分片的大小分别为1500字节,1500字节,860个字节。
目标主机怎样把属于同一个报文的分片重组成一个报文呢?这得益于IP报文头的Identifier, flag和offset三个字段。目的主机会把具有相同标识(Identifier),相同源ip和目的ip的报文进行重组,每个分片的报文头都有一个offset字段,说明它属于原始报文的偏移,还有一个MF标识位,说明它是否为最后一个分片。
分片时,除了最后一个分片外,其它所有分片的MF都会被置为1,最后一个分片的MF字段和被分片的报文的MF字段保持一致。这样做的原因是,报文在传输的过程中是可能会被多次分片的,当前分片的最后一个报文并不一定是发送端发送的原始报文的最后一个分片。同理,分片的offset字段也是在被分片报文的offset字段上进行偏移获取的。
不同分片在网络中的转发路径不一定相同,因此它们到达目标主机不一定是按照原始顺序。目标主机需要维护一个buffer存放报文体,当分片到达时,提取出分片的报文头,找到该分片对应于buffer中的位置,提取出报文体放入buffer的对应位置。当MF=0的报文到达时,我们知道它就是最后一个报文,可以通过此报文的offset判断出原始报文的总长度。目标主机还会维护一个bit表,用来维护每个分片到达的情况,到达的分片对应的bit位将会被置为1,当目标主机发现所有分片已到达时,将会把buffer中存储的数据交给上层应用。如果超过规定的时间,还有分片没有到达,目标主机会丢弃该报文的所有分片。
下面是rfc791中的原始描述,可以深读一下,对于编程还是有很大帮助的