infiniflow / infinity

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text
https://infiniflow.org
Apache License 2.0
2.7k stars 275 forks source link

ROADMAP 2024 #338

Open writinwaters opened 11 months ago

writinwaters commented 11 months ago

v0.6.0 planning

Core

Tools

v0.5.0

Core:

Tools

v0.4.0

Core:

Integration

API

Tools

v0.3.0

Core:

v0.2.0

v0.1.0

Backlog

Core

Integration

Tools

yuzhichang commented 11 months ago

CI improvement: post logs of infinity when CI failure, use Ubuntu 20.04 as base of dev image. Fuzz test of infinity.

cjkbjhb commented 11 months ago

Secordary index on structured data type. ---> Secondary index on structured data types.

Here is a mis-spelling error.

JinHai-CN commented 11 months ago
  • Secondary

Fixed and thank you.

yuzhichang commented 10 months ago

compatibility testing

image tag refer
centos 7 8 https://hub.docker.com/_/centos/
ubuntu 20.04 22.04 24.04 https://hub.docker.com/_/ubuntu https://releases.ubuntu.com/
debian 8 9 10 11 12 https://hub.docker.com/_/debian https://www.debian.org/releases/
opensuse/leap 15.0 15.1 15.2 15.3 15.4 15.5 https://hub.docker.com/r/opensuse/leap
openeuler/openeuler 20.03   22.03 https://hub.docker.com/r/openeuler/openeuler
openanolis/anolisos 8.6 23 https://hub.docker.com/r/openanolis/anolisos
openkylin/openkylin 1.0 https://hub.docker.com/r/openkylin/openkylin
Kelvinyu1117 commented 10 months ago

I would like to contribute to this project, which issue would be a good start?

JinHai-CN commented 10 months ago

@Kelvinyu1117 We do have a couple of issues that might work for contributors new to this project.

  1. Add minmax information to blocks/segments in the current datastore. This information is primarily used for data filtering. (#448)
  2. Implement a bloomfilter for the blocks/segments to enhance point queries. (#467)
  3. Currently, query results are stored in memory in a columnar format. However, the client expects the results in Apache Arrow format. At the moment, the format conversion is executed on the Python client, but this worsens the performance, so we plan to convert the results to Apache Arrow format on the server side before sending them to the client.
  4. There are several optimizer rules to implement, such as constant folding and simplification of arithmetic expressions, which are not yet on the roadmap. Feel free to work on them if interested.
  5. We have additional complicated tasks not listed here. For instance, the current executor operates with one thread per CPU. We're considering using coroutine to enhance efficiency, but we don't have a solid solution yet. If you have experience in this area, you are very welcome to propose your solution.
  6. We understand you're interested in contributing C++ code. However, if that's not the case, there's also unimplemented Python code, such as test cases and the Python SDK API.
abdullah-alnahas commented 6 months ago

Your work is exceptional! I would like to propose that, considering the current landscape, incorporating binary quantization and ColBERT-like ranking would be crucial for any vector database. Apologies for commenting on the road map issue instead of creating a separate feature request.

JinHai-CN commented 6 months ago

Your work is exceptional! I would like to propose that, considering the current landscape, incorporating binary quantization and ColBERT-like ranking would be crucial for any vector database. Apologies for commenting on the road map issue instead of creating a separate feature request.

Nice, we will put this request into v0.2.0 release.

niebayes commented 6 months ago

@JinHai-CN Hi, I have experience in developing a database using Arrow. Is the issue that converting query results to Arrow format still active? I'd like to take it.

JinHai-CN commented 6 months ago

@niebayes #1198, issue is created and we can discuss the requirement in that issue.