basics#6 - Database & Cache

sync-by-unito[bot] commented 4 years ago

Note to bush up system design basics#6 - Database & Cache (know your Swiss knife)

Too many resources on the internet, just a few points: 1.ACID vs BASE when scale (ask why and what trade-off it brings) https://www.slideshare.net/jboner/scalability-availability-stability-patterns/65-When_isa_RDBMSnotgood_enough https://www.youtube.com/watch?v=w95murBkYmU https://db-engines.com/en/ranking

2.Consistent Hashing in NoSQL: (To make them distributed & highly available) https://www.mikeperham.com/2009/01/14/consistent-hashing-in-memcache-client/ https://www.toptal.com/big-data/consistent-hashing https://medium.com/@dgryski/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8

3.Gossip or Paxos protocol (To maintain distributed db's consistency and availability) http://highscalability.com/blog/2011/11/14/using-gossip-protocols-for-failure-detection-monitoring-mess.html?fbclid=IwAR2W_9PODC_cLXYLhpRefaOV3pIjUGcPPex59PttB82DP09V2DkEuVcK7dM

4.Cache in a nutshell: a. When to update: client-aside, write-through, write-behind, refresh-ahead(predict). b. How to invalidate: LRU(expensive), FIFO(remove popular ones), Clock(less expensive)

https://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast https://www.powershow.com/view/95163-NzkyO/4_4_20Page_20replacement_20algorithms_powerpoint_ppt_presentation#.XzbRxehKhPY https://en.wikipedia.org/wiki/Cache_replacement_policies

┆Issue is synchronized with this Trello card by Unito ┆Attachments: Why start with SQL? | Why might you need NoSQL?

sync-by-unito[bot] commented 4 years ago

➤ Nelson 3513 commented:

Although a relational database can do almost all the storage work, please remember do not save a blob, like a photo, into a relational database, and choose the right database for the right service. For example, read performance is important for follower service, therefore it makes sense to use a key-value cache. Feeds are generated as time passes by, so HBase / Cassandra’s timestamp index is a great fit for this use case. Users have relationships with other users or objects, so a relational database is our choice by default in an user profile service.

Column-oriented Store The abstraction of a column-oriented store is like a giant nested map: ColumnFamily<RowKey, Columns<Name, Value, Timestamp>>. The main reason we want to use a column-oriented store is that it is distributed, highly-available, and optimized for write.

Out-of-box choices: Cassandra, HBase, Hypertable, Amazon SimpleDB, etc.

sync-by-unito[bot] commented 4 years ago

➤ Nelson 3513 commented:

Other reference: 1.BigTable Paper: https://static.googleusercontent.com/media/research.google.com/zh-TW//archive/bigtable-osdi06.pdf 2.AWS re:Invent 2018: Amazon DynamoDB Deep Dive https://www.youtube.com/watch?v=HaEPXoXVf2k&t=16 3.Redis Architecture: http://qnimate.com/overview-of-redis-architecture/

sync-by-unito[bot] commented 4 years ago

➤ Nelson 3513 commented:

Papers TBR:

BigTable: https://static.googleusercontent.com/....../bigtable......
Amazon Dynamo: http://courses.cse.tamu.edu/....../readings/dynamo-paper.pdf 3.Consistent Hashing: https://www.akamai.com/....../consistent-hashing-and......
Anything else?

nelson-yeh-fy / Adventure

basics#6 - Database & Cache #30