Open nehanims opened 4 weeks ago
A messaging system with fault tolerance (redundant nodes with duplicated data + distributed consensus mechanisms, persistent event logs allow replay of events in case of failure), distributed and scalable (add nodes to cluster to scale), fast reads and writes (sequential access, append only log). It also provides temporal decoupling of events/producers from their downstream processing/consumers. Just add a new consumer if you need further processing on past events for a topic, from the first event or from an offset within the topic. If the world is a state machine, this is a persistent log of state transitions. Edit: I guess this is a design pattern! Kafka releases the consumers from explicitly needing to handle short-term backpressure (like short term undexpected loads), although if the producers are consistently producing at a rate such that the backlog is growing continuously, then some backpressure mechanism will need to be implemented.
The performance difference between sequential and random access in HDDs (Hard Disk Drives) and SSDs (Solid-State Drives) is quite significant, both within each type of storage and when comparing the two.
Sequential Access:
Random Access:
Comparison:
Sequential Access:
Random Access:
Comparison:
Overall, SSDs outperform HDDs in both sequential and random access scenarios, with the difference being most dramatic in random access.
Additionally, Persistent event management systems are more fault tolerant, and easier to debug. For example, if a consumer fails at a particular message, it will try processing the last failed message when restarted.
Non-blocking reactive endpoints offer several advantages, especially in scenarios where handling a large number of concurrent requests efficiently is crucial. Here’s a breakdown of the benefits and some examples of popular libraries and frameworks:
High Concurrency Handling:
Better Resource Utilization:
Scalability:
Improved Responsiveness:
Fault Tolerance and Resilience:
Java/Spring Ecosystem:
JavaScript/Node.js:
Python:
async
and await
syntax for non-blocking code..NET:
async
and await
, ASP.NET Core can be used to create non-blocking APIs.High-Throughput APIs: Services like social media platforms, real-time analytics dashboards, or any service expected to handle a large number of concurrent API requests.
Streaming Data Applications: Applications processing streams of data, such as video streaming services, live sports updates, or stock market tickers.
Microservices Architectures: Systems composed of multiple microservices, where inter-service communication needs to be efficient and resilient.
IoT Applications: Applications that need to handle data from numerous devices in real-time, such as smart home systems or industrial IoT solutions.
Event-Driven Systems: Systems that react to a series of events, such as user actions in a web application, or events from a messaging system like Kafka.
Cloud-Native Applications: Applications designed to scale elastically in cloud environments, where efficient resource utilization is crucial.
Live Video Streaming Many VoIP and video conferencing applications leverage UDP due to its lower overhead and ability to tolerate packet loss. Real-time communication benefits from UDP's reduced latency compared to TCP.
DNS DNS (Domain Name Service) queries typically use UDP for their fast and lightweight nature. Although DNS can also use TCP for large responses or zone transfers, most queries are handled via UDP.
Market Data Multicast In low-latency trading, UDP is utilized for efficient market data delivery to multiple recipients simultaneously.
IoT UDP is often used in IoT devices for communications, sending small packets of data between devices.
Imperative Programming Imperative programming describes a sequence of steps that change the program’s state. Languages like C, C++, Java, Python (to an extent), and many others support imperative programming styles.
Declarative Programming Declarative programming emphasizes expressing logic and functionalities without describing the control flow explicitly. Functional programming is a popular form of declarative programming.
Object-Oriented Programming (OOP) Object-oriented programming (OOP) revolves around the concept of objects, which encapsulate data (attributes) and behavior (methods or functions). Common object-oriented programming languages include Java, C++, Python, Ruby, and C#.
Aspect-Oriented Programming (AOP) Aspect-oriented programming (AOP) aims to modularize concerns that cut across multiple parts of a software system. AspectJ is one of the most well-known AOP frameworks that extends Java with AOP capabilities.
Functional Programming Functional Programming (FP) treats computation as the evaluation of mathematical functions and emphasizes the use of immutable data and declarative expressions. Languages like Haskell, Lisp, Erlang, and some features in languages like JavaScript, Python, and Scala support functional programming paradigms.
Reactive Programming Reactive Programming deals with asynchronous data streams and the propagation of changes. Event-driven applications, and streaming data processing applications benefit from reactive programming.
Generic Programming Generic Programming aims at creating reusable, flexible, and type-independent code by allowing algorithms and data structures to be written without specifying the types they will operate on. Generic programming is extensively used in libraries and frameworks to create data structures like lists, stacks, queues, and algorithms like sorting, searching.
Concurrent Programming Concurrent Programming deals with the execution of multiple tasks or processes simultaneously, improving performance and resource utilization. Concurrent programming is utilized in various applications, including multi-threaded servers, parallel processing, concurrent web servers, and high-performance computing.
Why NOT to use GraphQL for production: Sounds like a headache to work with in terms of auth, BUT maybe a great tool to expose the database to frontend during development and then only build the useful APIs identified by frontend as REST endpoints. So use graphql in dev environment only to allow quick feature testing without requiring building each REST endpoint until it is finalized.
The main difference between Zookeeper and KRaft mode in Kafka revolves around how Kafka manages metadata and consensus.
In summary, while the consensus mechanism is a significant part of the difference, KRaft mode also represents a shift towards a more integrated and simplified Kafka architecture, aiming to improve ease of use and scalability.
Metadata would be some kind of relational DB since you might need to query and manipulate it often, whereas audio, you won't really change much. Relational data is great for such operation, but LOB storage and varying size data isn't efficient in relational DBs. S3 is designed to storage large files or varying sizes. And S3 is designed to handle streaming etc (what specific things make it easier?). So access patterns, and the size of data are the main reason to split the files and metadata.
In relational databases, the efficiency of handling varying-sized columns (e.g., VARCHAR
, TEXT
, BLOB
) can significantly impact both storage and performance. Here’s how this works:
CHAR
): Every entry in a fixed-length column takes up the same amount of space, regardless of the actual data length. This can lead to inefficient use of space if the data varies widely in size, as all entries will reserve the maximum length, even if not fully utilized.VARCHAR
, TEXT
, BLOB
): These columns only use as much space as needed for the actual data plus a small overhead for storing the length of the data. This is more space-efficient when dealing with varying-sized data because only the necessary amount of storage is used.CHAR
or other fixed-length types can be beneficial for performance due to the predictability in storage and memory allocation.VARCHAR
or TEXT
is generally more efficient because it avoids the wasted space associated with fixed-length columns.BLOB
or TEXT
types are appropriate. These are typically stored outside the main table space, with the table storing pointers, reducing the impact on row and page size.By understanding how varying-sized columns affect storage and performance, you can make informed decisions about schema design in relational databases.
Client side caching for frequently requested data?
Create ARDS:
Adding design decision rationale to your project on GitHub is essential for maintaining clarity and providing context for future contributors. Here's how you can effectively document design decisions:
1. Create a Dedicated Documentation Section:
docs/
folder in the root of your repository to store all documentation files.docs/
folder, create adesign-decisions/
directory where you can keep all design decision documents.2. Use ADRs (Architecture Decision Records):
What Are ADRs? ADRs are short documents that capture an architectural decision, the context in which it was made, and its consequences. They provide a structured format to document design decisions.
Structure of an ADR:
Example ADR:
3. Link to ADRs in Your README or CONTRIBUTING File:
docs/design-decisions/
directory.CONTRIBUTING.md
file, add a section about the design decisions and how contributors should document any new decisions they make.4. Maintain a Design Decision Log:
index.md
orREADME.md
within thedesign-decisions/
folder that lists all the ADRs with links to each document.5. Use GitHub Issues and Pull Requests:
6. Keep ADRs Up-to-Date:
By following this structured approach, you’ll have a clear and well-documented rationale for all major design decisions in your project, making it easier for current and future contributors to understand the architecture and contribute effectively.