nelson-yeh-fy / Adventure

0 stars 0 forks source link

SD Url shortener #26

Open sync-by-unito[bot] opened 4 years ago

sync-by-unito[bot] commented 4 years ago

I ask myself why do people care URL shortener if it's not just an interview question? the reason may includes these:

  1. More exciting extend system like a. Alias and url branding, b. UTM source tracking, c. integrate with Ad re-targeting.
  2. Present your knowledge about Hash vulnerability (MD5 collision), Base62, KGS, cache-aside,..etc.

(My google drawing): https://docs.google.com/drawings/d/1HZM267GFam1retVr4nbwdiLfsQTw8lguiJmPuaK0cqI/edit


Discussion/Presentation flow ( using acronym R-E-A-D-M-E ):

-R-E- (Requirement and Estimate) 5-10 mins

FRs - Prioritize and identify expectations

  1. Given a URL, system generates a short URL (7 digit) for it.
  2. The shortened URL must be unique.
  3. Redirect user to the original URL.
  4. User is able to setup expiration date.

NFRs - Latency&Throughput / Availability&Consistency / Read&Write ratio

  1. High Availability.
  2. Low Latency and Read most (1000:1)

-A-D- (API, Data modeling) 5-10 mins

Use case diagram, take FRs & NFRs into account API: POST/PUT/GET/DELETE /api/REST/1.0/user//shortURL { shorten_URL_decoded_id: int original_URL: varchar(512), created_at: datatime, expired_at: datatime } Data modeling discuss the flow among various components, a. shortened URL has no relationship with other URLs, using KV stores is fine. b. NFR (Low latency and Read most) => KV stores and KV cache. c. NFR (High availability) => Distributed KV stores Key: recordID(int), Value: originalURL(varchar(512), created_at: datetime, expired_at: datetime

-M-E- (Main system, Extended system) 10-20 mins

Algorithms to generate URL 1.Using MD5/SHA, and then Base62 encoding for displaying. Pros: not rely on a central server to calculate, less complexity; easier scaling. Cons: the same URL leads to the same MD5, reach a DB becomes a must to check it's existing or not. Cons: MD5 collision vulnerabilities, we can't use it as DB key because it's not unique. Base62 encoding to take first 8digits for displaying, leads to more possible collisions.

2.Using counter to generate keys, Cons: single point of failure in that counter service Cons: only RDBMS can guarantee it'll be unique, NoSQL's BASE can't guarantee, but RDBMS is hard to scale.

3.Creating KGS (Key Generation Service) and Base62 encoding. Note: using usual incremental DB keys, encode keys as shortened URL. Pros: No collision, Base62^7 represents 3.5 trillion URLs. (62^8=218 Tri) Cons: additional complexity & concurrency problem. Solve the concurrency problem by loading keys into service cache, and move them to 'used keys' immediately.

Extended services and cache and purge expired data

  1. Shard them by the range or by hash (more evenly),
  2. security about guessing the next shortened URL? seems not an issue because URL is supposed to be public. If need, we add random unique seed to make it un-predictable. **3. client-aside cache and purge expired data, worker service to delete and retrieve keys back.
  3. Analytics, cache to record hit count and batch sync to backend.** 5.How to prevent visiting restricted sites? a. maintaining a blacklist of hostnames in a KV store. b. scaling techniques like bloom filter.

Bloom filter https://llimllib.github.io/bloomfilter-tutorial/#:~:text=A%20Bloom%20filter%20is%20a,may%20be%20in%20the%20set.

Ref: https://www.educative.io/courses/grokking-the-system-design-interview/m2ygV4E81AR https://medium.com/swlh/how-to-build-a-tiny-url-service-that-scales-to-billions-f6fb5ea22e8c https://tianpan.co/notes/84-designing-a-url-shortener https://blog.codinghorror.com/url-shortening-hashes-in-practice/ https://buffer.com/library/url-shorteners/ https://www.youtube.com/watch?v=fMZMm_0ZhK4 Why URL Shortener: https://blog.rebrandly.com/utm-parameters-made-simple-guide-for-marketers/ https://buffer.com/library/utm-guide/

┆Issue is synchronized with this Trello card by Unito ┆Attachments: Tushar Roy - Design a service like TinyUrl | Tech Dummies - tinyurl | bitly system design | image.jpeg | Screenshot - 10_28_2020 , 4_17_01 PM.png