Closed jerryyifei closed 11 months ago
Likely related: when indexing without an accelerator, get a similar index out of range error. My guess is it may be related to the "KMeans: cluster 186 is empty" but how do I make sure there are no empty clusters then?
[2023-11-26T10:11:45Z WARN lance_linalg::kmeans] KMeans: cluster 186 is empty
0%| | 0/1000 [00:00<?, ?it/s]thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq.rs:116:14:
range end index 229376 out of range for slice of length 228480
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq.rs:116:14:
range end index 229376 out of range for slice of length 228480
thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq.rs:116:14:
range end index 229376 out of range for slice of length 228480
thread 'lance_background_thread' panicked at /home/runner/work/lance/lance/rust/lance/src/utils/tokio.rs:30:24:
called `Result::unwrap()` on an `Err` value: RecvError(())
thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq.rs:116:14:
range end index 229376 out of range for slice of length 228480
thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq.rs:116:14:
range end index 229376 out of range for slice of length 228480
thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq.rs:116:14:
range end index 229376 out of range for slice of length 228480
thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq.rs:116:14:
range end index 229376 out of range for slice of length 228480
thread 'lance_background_thread' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/task/core.rs:375:22:
JoinHandle polled after completion
thread 'lance_background_thread' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/task/core.rs:375:22:
JoinHandle polled after completion
0%| | 0/1000 [00:00<?, ?it/s]
thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq.rs:116:14:
range end index 229376 out of range for slice of length 228480
Traceback (most recent call last):
File "/home/ubuntu/benchmarking/benchmark.py", line 219, in <module>
eval_lance(query, ground_truth, 5, "cosine")
File "/home/ubuntu/benchmarking/benchmark.py", line 146, in eval_lance
res = tbl.search(list(q)).metric(metric).limit(k).refine_factor(ref_factor).to_list()
File "/home/ubuntu/.pyenv/versions/rag/lib/python3.10/site-packages/lancedb/query.py", line 217, in to_list
return self.to_arrow().to_pylist()
File "/home/ubuntu/.pyenv/versions/rag/lib/python3.10/site-packages/lancedb/query.py", line 409, in to_arrow
return self._table._execute_query(query)
File "/home/ubuntu/.pyenv/versions/rag/lib/python3.10/site-packages/lancedb/table.py", line 970, in _execute_query
return ds.to_table(
File "/home/ubuntu/.pyenv/versions/rag/lib/python3.10/site-packages/lance/dataset.py", line 321, in to_table
).to_table()
File "/home/ubuntu/.pyenv/versions/rag/lib/python3.10/site-packages/lance/dataset.py", line 1595, in to_table
return self.to_reader().read_all()
File "pyarrow/ipc.pxi", line 757, in pyarrow.lib.RecordBatchReader.read_all
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: Io error: Execution error: External error: Execution error: ExecNode(Take): thread panicked: task 3247216 panicked
@wjones127 thanks, when will it be merged into main branch? Or, how can I fix it in the current version?
@amulil this was merged to main and we'll probably cut a new release including this fix within a week.
Python: 3.8.9 pylance: 0.8.17
File "create_index.py", line 6, in
full_ds.create_index(
File "/usr/local/lib/python3.8/dist-packages/lance/dataset.py", line 992, in create_index
ivf_centroids = train_ivf_centroids_on_accelerator(
File "/usr/local/lib/python3.8/dist-packages/lance/vector.py", line 173, in train_ivf_centroids_on_accelerator
kmeans.fit(ds)
File "/usr/local/lib/python3.8/dist-packages/lance/torch/kmeans.py", line 135, in fit
self.total_distance = self._fit_once(
File "/usr/local/lib/python3.8/dist-packages/lance/torch/kmeans.py", line 181, in _fit_once
for idx, chunk in enumerate(data):
File "/usr/local/lib/python3.8/dist-packages/lance/torch/data.py", line 122, in iter
for batch in stream:
File "/usr/local/lib/python3.8/dist-packages/lance/cache.py", line 68, in iter
for batch in self.stream:
File "/usr/local/lib/python3.8/dist-packages/lance/sampler.py", line 134, in maybe_sample
yield from _efficient_sample(dataset, n, columns, batch_size, max_takes)
File "/usr/local/lib/python3.8/dist-packages/lance/sampler.py", line 74, in _efficient_sample
dataset.take(
File "/usr/local/lib/python3.8/dist-packages/lance/dataset.py", line 452, in take
return pa.Table.from_batches([self._ds.take(indices, columns)])
OSError: Invalid user input: Row index 26514819 is beyond the range of the dataset., /home/runner/work/lance/lance/rust/lance/src/dataset.rs:892:31