milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.39k stars 2.73k forks source link

[Enhancement]: Support Streaming Service in Milvus #33285

Open chyezh opened 1 month ago

chyezh commented 1 month ago

Motivation

Architecture

image The following changes will be made:

Components Responsibility

Goals

RoadMap

Streaming Service Implementation

Use Streaming Service To Produce

Use Streaming Service To Consume In Query

Use Streaming Service To Consume In Flush

Rolling upgrade

Incoming

Limitation

xiaofan-luan commented 1 month ago

what about name it as streaming service?

jaime0815 commented 2 weeks ago

Streaming Service Upgrading In Milvus 2.5

Dependency Specification

Version 2.4 relies on the pub/sub capability of MQ for both reading and writing paths to support data persistence and querying of streaming data respectively.

Write path image

Read path image

TimeTick lifetime image

In version 2.5, the pub/sub API is provided by StreamNode, and MQ reading and writing are encapsulated within StreamNode. Write path image

Read path image

TimeTick lifetime image Significant changes in dependency order between versions 2.4 and 2.5 imply that the existing upgrade plan cannot meet the requirements.

Upgrade plan

. [Plan 1] Upgrade with downtime

  1. Stop writing on the client side.
  2. Execute flushAll to trigger flushing all data from MQ to disk
  3. Stop the 2.4 version cluster
  4. Start the 2.5 version cluster

[Plan 2] Upgrade with no downtime

  1. Upgrade MixCoord, including RootCoord, QueryCoord, DataCoord, StreamCoord, at this time:

    • In the 2.5 version, RootCoord still needs to execute the TimeTick logic
    • After the upgrade, there are no changes in the read and write paths compared to the 2.4 version
  2. Stop dataNode, the flush process will be terminated, Proxy can still accept all requests.

    1. Start StreamNode, each pchannel will be allocated to the stream node and is prepared for subscription by the stream node client at this point.
    2. Upgrade QueryNode, the new QueryNode will subscribe to vchannel with the stream node client, while the old QueryNode will continue to consume streaming from the MQ client.
  3. Upgrade Proxy, once all proxies are upgraded:

    • Stop sending TT logic on the RootCoord
    • Enable the insertion of data by the stream node client on the Proxy
    1. Stop IndexNode

Pros and cons:

Version 2.5 servers as a transitional version to 3.0, now we can take plan2 to ensure a smooth upgrade from 2.4, making code cleanup after upgrading to 3.0.

xiaofan-luan commented 2 weeks ago

Make sure upgrade can be smoothly is very important.

To simply the work we need to do, maybe we can keep delegator at querynode and do not move it to stream service.

One problem is how many stream node need to upgrade and it's size.

To upgrade smoothly, streaming node need to assign timestamp and try to merge data from TTstream and Proxy insert.

in 2.5 we can keep delegator still at querynode. and move delegator to streaming node at 3.0