nehanims / notes

Backend for voice-notes
0 stars 0 forks source link

Document Design Decisions #22

Open nehanims opened 4 weeks ago

nehanims commented 4 weeks ago

Create ARDS:

Adding design decision rationale to your project on GitHub is essential for maintaining clarity and providing context for future contributors. Here's how you can effectively document design decisions:

1. Create a Dedicated Documentation Section:

2. Use ADRs (Architecture Decision Records):

3. Link to ADRs in Your README or CONTRIBUTING File:

4. Maintain a Design Decision Log:

5. Use GitHub Issues and Pull Requests:

6. Keep ADRs Up-to-Date:

By following this structured approach, you’ll have a clear and well-documented rationale for all major design decisions in your project, making it easier for current and future contributors to understand the architecture and contribute effectively.

nehanims commented 4 weeks ago

Why use Kafka?

A messaging system with fault tolerance (redundant nodes with duplicated data + distributed consensus mechanisms, persistent event logs allow replay of events in case of failure), distributed and scalable (add nodes to cluster to scale), fast reads and writes (sequential access, append only log). It also provides temporal decoupling of events/producers from their downstream processing/consumers. Just add a new consumer if you need further processing on past events for a topic, from the first event or from an offset within the topic. If the world is a state machine, this is a persistent log of state transitions. Edit: I guess this is a design pattern! Kafka releases the consumers from explicitly needing to handle short-term backpressure (like short term undexpected loads), although if the producers are consistently producing at a rate such that the backlog is growing continuously, then some backpressure mechanism will need to be implemented.

Notes about Kafka's speed

The performance difference between sequential and random access in HDDs (Hard Disk Drives) and SSDs (Solid-State Drives) is quite significant, both within each type of storage and when comparing the two.

HDD (Hard Disk Drive):

SSD (Solid-State Drive):

HDD vs. SSD:

Overall, SSDs outperform HDDs in both sequential and random access scenarios, with the difference being most dramatic in random access.

Additionally, Persistent event management systems are more fault tolerant, and easier to debug. For example, if a consumer fails at a particular message, it will try processing the last failed message when restarted.

nehanims commented 4 weeks ago

Why use a fully reactive tech stack?

Non-blocking reactive endpoints offer several advantages, especially in scenarios where handling a large number of concurrent requests efficiently is crucial. Here’s a breakdown of the benefits and some examples of popular libraries and frameworks:

Advantages of Non-Blocking Reactive Endpoints

  1. High Concurrency Handling:

    • Reactive endpoints allow servers to handle many more concurrent connections compared to traditional blocking I/O. This is because threads are not blocked waiting for I/O operations (like reading from a database or network), allowing them to be reused for other tasks while waiting for responses.
  2. Better Resource Utilization:

    • By avoiding blocking, reactive systems can make more efficient use of CPU and memory. Since threads aren’t sitting idle waiting for I/O, the system can serve more requests with the same hardware.
  3. Scalability:

    • Reactive systems scale better with the number of requests, especially under high load. This is particularly beneficial in microservices architectures or cloud environments where services need to scale dynamically.
  4. Improved Responsiveness:

    • Applications can remain responsive under load because they can continue to process incoming requests without being bottlenecked by slow I/O operations.
  5. Fault Tolerance and Resilience:

    • Many reactive frameworks offer built-in patterns for handling failures, retries, and fallbacks, which contribute to building resilient systems.

Popular Libraries and Frameworks

  1. Java/Spring Ecosystem:

    • Project Reactor: The core library in the Spring ecosystem for building reactive applications. It’s used in Spring WebFlux, the reactive counterpart to Spring MVC.
    • RxJava: A popular library for composing asynchronous and event-based programs using observable sequences. Often used in Android development and other JVM-based projects.
    • Vert.x: A polyglot event-driven application framework that supports reactive programming. It’s particularly useful for building microservices and high-throughput web applications.
  2. JavaScript/Node.js:

    • Express.js with async/await: While not inherently reactive, Node.js uses a non-blocking event-driven architecture. Express.js, combined with async/await, can be used to create non-blocking APIs.
    • RxJS: A library for reactive programming using observables, extensively used in Angular applications for managing asynchronous operations.
  3. Python:

    • FastAPI: A modern, fast (high-performance) web framework for building APIs with Python. It is based on standard Python-type hints and uses the async and await syntax for non-blocking code.
    • Tornado: A Python web framework and asynchronous networking library, originally developed at FriendFeed.
  4. .NET:

    • ASP.NET Core: With support for asynchronous programming using async and await, ASP.NET Core can be used to create non-blocking APIs.
    • Reactive Extensions (Rx.NET): A library for composing asynchronous and event-based programs using observable sequences in .NET.

Projects That Benefit from Non-Blocking Reactive Endpoints

  1. High-Throughput APIs: Services like social media platforms, real-time analytics dashboards, or any service expected to handle a large number of concurrent API requests.

  2. Streaming Data Applications: Applications processing streams of data, such as video streaming services, live sports updates, or stock market tickers.

  3. Microservices Architectures: Systems composed of multiple microservices, where inter-service communication needs to be efficient and resilient.

  4. IoT Applications: Applications that need to handle data from numerous devices in real-time, such as smart home systems or industrial IoT solutions.

  5. Event-Driven Systems: Systems that react to a series of events, such as user actions in a web application, or events from a messaging system like Kafka.

  6. Cloud-Native Applications: Applications designed to scale elastically in cloud environments, where efficient resource utilization is crucial.

nehanims commented 4 weeks ago

image

image source

nehanims commented 4 weeks ago

image image source

nehanims commented 4 weeks ago

image image image Live Video Streaming Many VoIP and video conferencing applications leverage UDP due to its lower overhead and ability to tolerate packet loss. Real-time communication benefits from UDP's reduced latency compared to TCP.

DNS DNS (Domain Name Service) queries typically use UDP for their fast and lightweight nature. Although DNS can also use TCP for large responses or zone transfers, most queries are handled via UDP.

Market Data Multicast In low-latency trading, UDP is utilized for efficient market data delivery to multiple recipients simultaneously.

IoT UDP is often used in IoT devices for communications, sending small packets of data between devices.

source

nehanims commented 4 weeks ago

image

Imperative Programming Imperative programming describes a sequence of steps that change the program’s state. Languages like C, C++, Java, Python (to an extent), and many others support imperative programming styles.

Declarative Programming Declarative programming emphasizes expressing logic and functionalities without describing the control flow explicitly. Functional programming is a popular form of declarative programming.

Object-Oriented Programming (OOP) Object-oriented programming (OOP) revolves around the concept of objects, which encapsulate data (attributes) and behavior (methods or functions). Common object-oriented programming languages include Java, C++, Python, Ruby, and C#.

Aspect-Oriented Programming (AOP) Aspect-oriented programming (AOP) aims to modularize concerns that cut across multiple parts of a software system. AspectJ is one of the most well-known AOP frameworks that extends Java with AOP capabilities.

Functional Programming Functional Programming (FP) treats computation as the evaluation of mathematical functions and emphasizes the use of immutable data and declarative expressions. Languages like Haskell, Lisp, Erlang, and some features in languages like JavaScript, Python, and Scala support functional programming paradigms.

Reactive Programming Reactive Programming deals with asynchronous data streams and the propagation of changes. Event-driven applications, and streaming data processing applications benefit from reactive programming.

Generic Programming Generic Programming aims at creating reusable, flexible, and type-independent code by allowing algorithms and data structures to be written without specifying the types they will operate on. Generic programming is extensively used in libraries and frameworks to create data structures like lists, stacks, queues, and algorithms like sorting, searching.

Concurrent Programming Concurrent Programming deals with the execution of multiple tasks or processes simultaneously, improving performance and resource utilization. Concurrent programming is utilized in various applications, including multi-threaded servers, parallel processing, concurrent web servers, and high-performance computing.

nehanims commented 3 weeks ago

Why NOT to use GraphQL for production: Sounds like a headache to work with in terms of auth, BUT maybe a great tool to expose the database to frontend during development and then only build the useful APIs identified by frontend as REST endpoints. So use graphql in dev environment only to allow quick feature testing without requiring building each REST endpoint until it is finalized.

nehanims commented 3 weeks ago

Difference between kafka kraft mode and zookeeper

The main difference between Zookeeper and KRaft mode in Kafka revolves around how Kafka manages metadata and consensus.

1. Zookeeper Mode:

2. KRaft Mode:

Key Differences:

Other Considerations:

In summary, while the consensus mechanism is a significant part of the difference, KRaft mode also represents a shift towards a more integrated and simplified Kafka architecture, aiming to improve ease of use and scalability.

nehanims commented 2 weeks ago

Split the storage into Object Store for Files and Relational DB for Metadata

Metadata would be some kind of relational DB since you might need to query and manipulate it often, whereas audio, you won't really change much. Relational data is great for such operation, but LOB storage and varying size data isn't efficient in relational DBs. S3 is designed to storage large files or varying sizes. And S3 is designed to handle streaming etc (what specific things make it easier?). So access patterns, and the size of data are the main reason to split the files and metadata.

In relational databases, the efficiency of handling varying-sized columns (e.g., VARCHAR, TEXT, BLOB) can significantly impact both storage and performance. Here’s how this works:

1. Storage Efficiency:

2. Performance Considerations:

3. Choosing the Right Type:

4. Best Practices:

By understanding how varying-sized columns affect storage and performance, you can make informed decisions about schema design in relational databases.

nehanims commented 2 weeks ago

Client side caching for frequently requested data?