lancedb / lancedb

Developer-friendly, serverless vector database for AI applications. Easily add long-term memory to your LLM apps!
https://lancedb.github.io/lancedb/
Apache License 2.0
4.09k stars 282 forks source link

[RUST] Really need documents and examples for Rustlang #855

Closed zhyang-liu closed 7 months ago

zhyang-liu commented 7 months ago

Description

as a new bee to the rust language, I found there is almost no vector db library in this world. luckily I found one, but it's hard to read the source code and make the codes run.

so could you please provide some example or make the document more clear?

thank you for your help. with you good luck~ by the way, I'm reading python documents to see if there would be some hints.

Thank you again!

Link

No response

eddyxu commented 7 months ago

@zhyang-liu we are doing some refactoring of Rust API now https://github.com/lancedb/lancedb/pull/853

If you check out the source and run cargo doc , there are better document starting to come along.

zhyang-liu commented 7 months ago

@zhyang-liu we are doing some refactoring of Rust API now #853

If you check out the source and run cargo doc , there are better document starting to come along.

Thank you very much! after running cargo doc I got a manual which is differece from the one I got in https://docs.rs/vectordb/0.4.3/vectordb/

But after I copied the example code which is creating the table, the program did not passed the compiler. I don't know what to do as a beginner, could you please give me some hint?

Here's the codes and the compiler's error:

// main.rs

use std::sync::Arc;

use futures::executor::block_on;

async fn my_test() {
    use arrow_schema::{DataType, Field, Schema};
    use arrow_array::{RecordBatch, RecordBatchIterator};
    use arrow_array::{FixedSizeListArray, Int32Array};
    use arrow_array::types::Float32Type;
    use vectordb::Database;

    // use vectordb::{connection::{Database, Connection}, WriteMode};
    //                ^~~~~~~~~~
    //                 Error occurred here, connection is not found in crate.

    let db = Database::connect("data/sample-lancedb").await.unwrap();

    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int32, false),
        Field::new("vector", DataType::FixedSizeList(
            Arc::new(Field::new("item", DataType::Float32, true)), 128), true),
    ]));

    // Create a RecordBatch stream.
    let batches = RecordBatchIterator::new(vec![
        RecordBatch::try_new(schema.clone(),
                             vec![
                                 Arc::new(Int32Array::from_iter_values(0..10)),
                                 Arc::new(FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
                                     (0..10).map(|_| Some(vec![Some(1.0); 128])), 128)),
                             ]).unwrap()
    ].into_iter().map(Ok), schema.clone());

    db.create_table("my_table", Box::new(batches), None).await.unwrap();
}

fn main() {
    block_on(my_test());
}

And there is the compiler's report:

error[E0277]: the trait bound `RecordBatchIterator<std::iter::Map<std::vec::IntoIter<RecordBatch>, fn(RecordBatch) -> Result<RecordBatch, ArrowError> {Result::<RecordBatch, ArrowError>::Ok}>>: arrow_array::record_batch::RecordBatchReader` is not satisfied
   --> src\main.rs:29:33
    |
29  |     db.create_table("my_table", Box::new(batches), None).await.unwrap();
    |        ------------             ^^^^^^^^^^^^^^^^^ the trait `arrow_array::record_batch::RecordBatchReader` is not implemented for `RecordBatchIterator<std::iter::Map<std::vec::IntoIter<RecordBatch>, fn(RecordBatch) -> Result<RecordBatch, ArrowError> {Result::<RecordBatch, ArrowError>::Ok}>>`
    |        |
    |        required by a bound introduced by this call
    |
    = help: the following other types implement trait `arrow_array::record_batch::RecordBatchReader`:
              arrow_csv::reader::BufReader<R>
              arrow_ipc::reader::FileReader<R>
              arrow_ipc::reader::StreamReader<R>
              Box<R>
              arrow_json::reader::Reader<R>
              arrow_array::record_batch::RecordBatchIterator<I>
    = note: required for `Box<RecordBatchIterator<Map<IntoIter<RecordBatch>, fn(RecordBatch) -> ... {Result::<..., ...>::Ok}>>>` to implement `arrow_array::record_batch::RecordBatchReader`
    = note: the full type name has been written to 'C:\Users\brigh\RustroverProjects\learning-diesel\target\debug\deps\learning_diesel-4abcc3be430b150f.long-type-3888891328171522581.txt'
note: required by a bound in `Database::create_table`
   --> C:\Users\brigh\.cargo\registry\src\mirrors.sjtug.sjtu.edu.cn-be2141875385cea5\vectordb-0.4.3\src\database.rs:192:23
    |
189 |     pub async fn create_table(
    |                  ------------ required by a bound in this associated function
...
192 |         batches: impl RecordBatchReader + Send + 'static,
    |                       ^^^^^^^^^^^^^^^^^ required by this bound in `Database::create_table`
eddyxu commented 7 months ago

These new code has not been released yet. But you are correct that i should display more code in the example

https://github.com/lancedb/lancedb/blob/7a89b5ec68cb9cbffe8449ba607cfcdb46e852a1/rust/vectordb/src/lib.rs#L99-L100

I was hidden them to make the example looks shorter / easier to read.

These new interfaces will be released in the next release (0.4.4)

zhyang-liu commented 7 months ago

These new code has not been released yet. But you are correct that i should display more code in the example

https://github.com/lancedb/lancedb/blob/7a89b5ec68cb9cbffe8449ba607cfcdb46e852a1/rust/vectordb/src/lib.rs#L99-L100

I was hidden them to make the example looks shorter / easier to read.

These new interfaces will be released in the next release (0.4.4)

Thanks a lot! After updating to v0.4.5, the problem is solved.

zhyang-liu commented 7 months ago

After re-implement on my other computer, I found the root cause of the problem of the type error:

29  |     db.create_table("my_table", Box::new(batches), None).await.unwrap();
    |        ------------             ^^^^^^^^^^^^^^^^^ the trait `arrow_array::record_batch::RecordBatchReader` is not implemented for `RecordBatchIterator<std::iter::Map<std::vec::IntoIter<RecordBatch>, fn(RecordBatch) -> Result<RecordBatch, ArrowError> {Result::<RecordBatch, ArrowError>::Ok}>>`

It is because that the [dependencies] block in cargo.toml shall be using arrow-schema = "49" and arrow-array = "49" instead of 50 which is mentioned in

https://github.com/lancedb/lancedb/blob/7a89b5ec68cb9cbffe8449ba607cfcdb46e852a1/rust/vectordb/src/lib.rs#L36-L39

replace them with:

[dependencies]
vectordb = "0.4"
arrow-schema = "49"
arrow-array = "49"

and it will works fine.