Application Layer (一): overview, HTTP and socket

At the core of network application development is writing programs that run on different end systems and communicate with each other over the network.

Principles of Network Applications

Network Application Architectures

Before diving into software coding, you should have a broad architectural plan for your application. Keep in mind that an application’s architecture is distinctly different from the network architecture. There are two predominant architectural paradigms used in modern network applications: the client-server architecture or the peer-to-peer (P2P) architecture.

In a client-server architecture, there is an always-on host, called the server, which services requests from many other hosts, called clients. Note that with the client-server architecture, clients do not directly communicate with each other. Often in a client-server application, a single-server host is incapable of keeping up with all the requests from clients. For example, a popular social-networking site can quickly become overwhelmed if it has only one server handling all of its requests. For this reason, a data center, housing a large number of hosts, is often used to create a powerful virtual server.

One of the most compelling features of P2P architectures is their self-scalability.

Three major challenges:

ISP Friendly.
Security.
Incentives.

Processes Communicating

In the jargon of operating systems, it is not actually programs but processes that communicate. A process can be thought of as a program that is running within an end system. Processes on two different end systems communicate with each other by exchanging messages across the computer network.

In the context of a communication session between a pair of processes, the process that initiates the communication (that is, initially contacts the other process at the beginning of the session) is labeled as the client. The process that waits to be contacted to begin the session is the server.

A process sends messages into, and receives messages from, the network through a software interface called a socket. A process is analogous to a house and its socket is analogous to its door. When a process wants to send a message to another process on another host, it shoves the message out its door (socket).

A socket is the interface between the application layer and the transport layer within a host. It is also referred to as the Application Programming Interface (API) between the application and the network, since the socket is the programming interface with which network applications are built.

应用层和传输层之间通过 socket 接口建立联系。换句话说，应用层的应用通过 socket 接入整个因特网。

To identify the receiving process, two pieces of information need to be specified: (1) the address of the host and (2) an identifier that specifies the receiving process in the destination host.

In addition to knowing the address of the host to which a message is destined, the sending process must also identify the receiving process (more specifically, the receiving socket) running in the host. A destination port number serves this purpose.

Transport Services Available to Applications

Recall that a socket is the interface between the application process and the transport-layer protocol. We can broadly classify the possible services along four dimensions: reliable data transfer, throughput, timing, and security.

If a protocol provides such a guaranteed data delivery service, it is said to provide reliable data transfer.

Applications that have throughput requirements are said to be bandwidth-sensitive applications.

While bandwidth-sensitive applications have specific throughput requirements, elastic applications can make use of as much, or as little, throughput as happens to be available. Electronic mail, file transfer, and Web transfers are all elastic applications.

A transport-layer protocol can also provide timing guarantees.

Finally, a transport protocol can provide an application with one or more security services.

Transport Services Provided by the Internet

The Internet (and, more generally, TCP/IP networks) makes two transport protocols available to applications, UDP and TCP. one of the first decisions you have to make is whether to use UDP or TCP.

TCP Services

The TCP service model includes a connection-oriented service and a reliable data transfer service.

Connection-oriented service. TCP has the client and server exchange transport-layer control information with each other before the application-level messages begin to flow. This so-called handshaking procedure alerts the client and server, allowing them to prepare for an onslaught of packets. After the handshaking phase, a TCP connection is said to exist between the sockets of the two processes. The connection is a full-duplex connection in that the two processes can send messages to each other over the connection at the same time.

SECURING TCP

Neither TCP nor UDP provide any encryption—the data that the sending process passes into its socket is the same data that travels over the network to the destination process.

Because privacy and other security issues have become critical for many applications, the Internet community has developed an enhancement for TCP, called Secure Sockets Layer (SSL).

We emphasize that SSL is not a third Internet transport protocol, on the same level as TCP and UDP, but instead is an enhancement of TCP, with the enhancements being implemented in the application layer.

When an application uses SSL, the sending process passes cleartext data to the SSL socket; SSL in the sending host then encrypts the data and passes the encrypted data to the TCP socket. The encrypted data travels over the Internet to the TCP socket in the receiving process. The receiving socket passes the encrypted data to SSL, which decrypts the data. Finally, SSL passes the cleartext data through its SSL socket to the receiving process.

Reliable data transfer service. The communicating processes can rely on TCP to deliver all data sent without error and in the proper order. When one side of the application passes a stream of bytes into a socket, it can count on TCP to deliver the same stream of bytes to the receiving socket, with no missing or duplicate bytes.

TCP also includes a congestion-control mechanism.

UDP Services

UDP is a no-frills, lightweight transport protocol, providing minimal services. UDP is connectionless, so there is no handshaking before the two processes start to communicate. UDP provides an unreliable data transfer service—that is, when a process sends a message into a UDP socket, UDP provides no guarantee that the message will ever reach the receiving process. Furthermore, messages that do arrive at the receiving process may arrive out of order.

UDP does not include a congestion-control mechanism.

Services Not Provided by Internet Transport Protocols

We have organized transport protocol services along four dimensions: reliable data transfer, throughput, timing, and security.

In our brief description of TCP and UDP, conspicuously missing was any mention of throughput or timing guarantees—services not provided by today’s Internet transport protocols.

In summary, today’s Internet can often provide satisfactory service to time-sensitive applications, but it cannot provide any timing or throughput guarantees.

We see that email, remote terminal access, the Web, and file transfer all use TCP. These applications have chosen TCP primarily because TCP provides reliable data transfer, guaranteeing that all data will eventually get to its destination. Because Internet telephony applications (such as Skype) can often tolerate some loss but require a minimal rate to be effective, developers of Internet telephony applications usually prefer to run their applications over UDP, thereby circumventing TCP’s congestion control mechanism and packet over- heads. But because many firewalls are configured to block (most types of) UDP traffic, Internet telephony applications often are designed to use TCP as a backup if UDP communication fails.

Application-Layer Protocols

We have just learned that network processes communicate with each other by sending messages into sockets. But how are these messages structured? What are the meanings of the various fields in the messages? When do the processes send the messages? These questions bring us into the realm of application-layer protocols.

An application-layer protocol defines how an application’s processes, running on different end systems, pass messages to each other.

The types of messages exchanged, for example, request messages and response messages.
The syntax of the various message types.
The semantics of the fields.
Rules for determining when and how a process sends messages and responds to messages.

It is important to distinguish between network applications and application-layer protocols.

The Web and HTTP

Then, in the early 1990s, a major new application arrived on the scene—the World Wide Web [Berners-Lee 1994]. The Web was the first Internet application that caught the general public’s eye.

Perhaps what appeals the most to users is that the Web operates on demand. Hyperlinks and search engines help us navigate through an ocean of Web sites. Graphics stimulate our senses. Forms, JavaScript, Java applets, and many other devices enable us to interact with pages and sites.

And the Web serves as a platform for many killer applications emerging after 2003, including YouTube, Gmail, and Facebook.

Overview of HTTP

The HyperText Transfer Protocol (HTTP), the Web’s application-layer protocol, is at the heart of the Web.

HTTP is implemented in two programs: a client program and a server program.

HTTP defines the structure of these messages and how the client and server exchange the messages. Before explaining HTTP in detail, we should review some Web terminology.

It is important to note that the server sends requested files to clients without storing any state information about the client.

Because an HTTP server maintains no information about the clients, HTTP is said to be a stateless protocol.

A Web server is always on, with a fixed IP address, and it services requests from potentially millions of different browsers.

Non-Persistent and Persistent Connections

The default mode of HTTP uses persistent connections with pipelining.

HTTP Message Format

There are two types of HTTP messages, request messages and response messages.

The first line of an HTTP request message is called the request line; the subsequent lines are called the header lines. Let’s take a careful look at this response message. It has three sections: an initial status line, six header lines, and then the entity body.

The Last-Modified: header, which we will soon cover in more detail, is critical for object caching, both in the local client and in network cache servers (also known as proxy servers).

User-Server Interaction: Cookies

We mentioned above that an HTTP server is stateless. This simplifies server design and has permitted engineers to develop high-performance Web servers that can handle thousands of simultaneous TCP connections. However, it is often desirable for a Web site to identify users, either because the server wishes to restrict user access or because it wants to serve content as a function of the user identity. For these purposes, HTTP uses cookies.

Cookie technology has four components: (1) a cookie header line in the HTTP response message; (2) a cookie header line in the HTTP request message; (3) a cookie file kept on the user’s end system and managed by the user’s browser; and (4) a back-end database at the Web site.

Cookies can thus be used to create a user session layer on top of stateless HTTP.

Web Caching

A Web cache—also called a proxy server—is a network entity that satisfies HTTP requests on the behalf of an origin Web server.

Typically a Web cache is purchased and installed by an ISP.

Through the use of Content Distribution Networks (CDNs), Web caches are increasingly playing an important role in the Internet.

The Conditional GET

Although caching can reduce user-perceived response times, it introduces a new problem—the copy of an object residing in the cache may be stale. Fortunately, HTTP has a mechanism that allows a cache to verify that its objects are up to date. This mechanism is called the conditional GET.

(1) the request message uses the GET method and (2) the request message includes an If-Modified- Since: header line.

Note that this last response message has 304 Not Modified in the status line, which tells the cache that it can go ahead and forward its (the proxy cache’s) cached copy of the object to the requesting browser.

xxleyi / learning_list

Application Layer (一): overview, HTTP and socket #76

Principles of Network Applications

The Web and HTTP

Socket Programming: Creating Network Applications