milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Feature]: Support Dynamic plugin engine #19772

Open xiaofan-luan opened 2 years ago

xiaofan-luan commented 2 years ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Sometimes we need executable Go scripts and plugins, for example calculate a function or run some UDFs.

Describe the solution you'd like.

https://github.com/traefik helps on parse and run some code dynamiclly.

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

jiaoew1991 commented 2 years ago

Maybe embedding a golang interpreter in milvus is not safe. In the JVM language, they have different classloaders to achieve logic separation but in assembly language is hard to do this. If we support a plugin engine and let the plugin code access variables in milvus, it's dangerous and the plugin's bug will affect the stability in milvus.

We can consider starting the plugins in forked processes, milvus communication with the plugins through RPC

xiaofan-luan commented 2 years ago

Maybe embedding a golang interpreter in milvus is not safe. In the JVM language, they have different classloaders to achieve logic separation but in assembly language is hard to do this. If we support a plugin engine and let the plugin code access variables in milvus, it's dangerous and the plugin's bug will affect the stability in milvus.

We can consider starting the plugins in forked processes, milvus communication with the plugins through RPC

From your description https://github.com/hashicorp/go-plugin might be what we are looking for. The go plugin will be used in rpc and query path, might be to slow to use the remote go plugin?

wayblink commented 2 years ago

Maybe embedding a golang interpreter in milvus is not safe. In the JVM language, they have different classloaders to achieve logic separation but in assembly language is hard to do this. If we support a plugin engine and let the plugin code access variables in milvus, it's dangerous and the plugin's bug will affect the stability in milvus. We can consider starting the plugins in forked processes, milvus communication with the plugins through RPC

From your description https://github.com/hashicorp/go-plugin might be what we are looking for. The go plugin will be used in rpc and query path, might be to slow to use the remote go plugin?

OLAP engines support udfs and started to support remote UDFs these years. e.g.: Presto https://github.com/prestodb/presto/issues/14053 Bytedance also developed remote udf server for cloud products.

Remote udf is safe, flexible, support multi langurages but needs extra maintenance, more or less slower.

I think we can support both original and remote for different scenarios on design.

xiaofan-luan commented 2 years ago

Maybe embedding a golang interpreter in milvus is not safe. In the JVM language, they have different classloaders to achieve logic separation but in assembly language is hard to do this. If we support a plugin engine and let the plugin code access variables in milvus, it's dangerous and the plugin's bug will affect the stability in milvus. We can consider starting the plugins in forked processes, milvus communication with the plugins through RPC

From your description https://github.com/hashicorp/go-plugin might be what we are looking for. The go plugin will be used in rpc and query path, might be to slow to use the remote go plugin?

OLAP engines support udfs and started to support remote UDFs these years. e.g.: Presto prestodb/presto#14053 Bytedance also developed remote udf server for cloud products.

Remote udf is safe, flexible, support multi langurages but needs extra maintenance, more or less slower.

I think we can support both original and remote for different scenarios on design.

It's kind of depending on the batch size we call UDF server. For RPC level verification Plugin has to be fast. If we apply result on segment result or batched result, UDF server is better.

Actually I'm thinking of introduce another UDF server for doing some ranking job after vector retrieval