leandromoreira / cdn-up-and-running

CDN Up and Running - Building a CDN from Scratch to Learn about CDN, Nginx, Lua, Prometheus, Grafana, Load balancing, and Containers.
BSD 3-Clause "New" or "Revised" License
3.22k stars 197 forks source link

Downside of Round-Robin load balancing, and cache-awareness #10

Closed metalalive closed 1 year ago

metalalive commented 1 year ago

Hi, could you please explain how cache-awareness leads to the downside of Round-Robin load balancing ?

According to round-robin part of the article :

What's not good about it? It's not caching-aware, meaning multiple clients will face higher latencies because they're asking uncached servers.

Here is my understanding, when multiple clients ask for uncached content, the latency of the concurrent client requests comes from coalescing (the Nginx directive proxy_cache_lock) , only one request passed to upstream server, all other client requests simply wait until the fresh content is cached. It is probably not related to Round-Robin load balancing ?

thanks for the reply.

leandromoreira commented 1 year ago

Hi @metalalive =) sure, I'll try to provide my reasoning behind that sentence:

Here is my understanding, when multiple clients ask for uncached content, the latency of the concurrent client requests comes from coalescing (the Nginx directive proxy_cache_lock) , only one request passed to upstream server, all other client requests simply wait until the fresh content is cached. It

Your understanding seems right to me :)! The main idea is to serve fast, save bandwidth, save storage, avoid extra hops, etc.

Think about a CDN, with 100 nodes, where you have an unpopular content video.mp4 being accessed by 100 users. Since you're using RR, as the load balancing police, every SINGLE user will pay the latency because each server does not have the content cached in its area (that's the cache unawareness). You can add on top of that the disks/SSD storage will be filled with content (not so hot) that could be better distributed.

However, if you have hugely popular content with millions of users the RR might be good enough at a CDN level, assuming that the user was directed to the closest CDN PoPsee figure1 (like in their ISP, or city). There's even a better algorithm than RR alone with, I'd say, almost the same characteristics, the random of two

figure1

image

The graph bellow shows that Alice, Bob, and Carol all paid the extra latency cost, and the CDN itself paid the cost of leaving the DC to fetch the source object (assuming a pull only nature). One can argue that you could organize your nodes, within the CDN, to avoid some of these issues.

I believe that everything comes down to acting differently depending on the circumstance; trying to offer fast access to popular content for and unpopular content demands various tactics. Given that the resources (storage, RAM, network, etc.) are limited and that content is not consumed linearly, finding a single balancing policy solution that meets all requirements is challenging, if not impossible. If so, does it make sense?

flowchart LR
    A[Alice] -->|Get /video.mp4| RR{RR}
    RR -->|Sends Alices to| S1{Server1}
    B[Bob] -->|Get /video.mp4| RR
    RR -->|Sends Bob to| S2{Server2}
    C[Carol] -->|Get /video.mp4| RR
    RR -->|Sends Carol to| S3{Server3}
    S1 -->|Fetches from origin| O{Origin}
    S2 -->|Fetches from origin| O{Origin}
    S3 -->|Fetches from origin| O{Origin}

vs

flowchart LR
    A[Alice] -->|Get /video.mp4| RR{Caching-aware}
    RR -->|Sends Alices to| S1{Server1}
    B[Bob] -->|Get /video.mp4| RR
    RR -->|Sends Bob to| S1
    C[Carol] -->|Get /video1.mp4| RR
    RR -->|Sends Carol to| S1
    D[Dave] -->|Get /video2.mp4| RR
    RR -->|Sends Dave to| S2{Server2}
    S1 -->|Fetches from origin| O{Origin}
    S2 -->|Fetches from origin| O{Origin}
metalalive commented 1 year ago

Can I view the blocks RR, Caching-aware, server1 , server2 (in the diagrams above) as different nodes (like nginx servers) in a CDN ? so the CDN PoP (with a set of nodes) will face the caching issue you mentioned, when :

leandromoreira commented 1 year ago

All the drawing/graphics used were just to demonstrate abstract concepts. In reality a user request might follow:

  1. A user requests content XYZ to API/DNS routing/Anycast
    • this part might work as the PoP load balancing
      • picking up the closest PoP (set of nodes) and available (resourcefulness, etc),
    • and also the node balancing within the PoP
      • picking up the best node, caching/resource aware
  2. API/DNS routing/Anycast returns/resolve to the closest NODE
  3. The user watches the stream/downloads the game/assets, etc
  4. If the content is in the cache it just serves, if not it goes to the tiered network of PoPs/nodes (you can also think of it as a "tree")

image

metalalive commented 1 year ago

Yes, in reality the PoP load balancing will do things more intelligently , thanks for explaining