lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

lance.torch.data.LanceDataset(to_tensor_fn=...) typehint inconsistent with usage. #3129

Open pimdh opened 6 days ago

pimdh commented 6 days ago

Hi, thanks for creating this amazing library!

There seems to be a small issue with the lance.torch.data.LanceDataset in the to_tensor_fn kwarg of the initializer. Its type hint is:

to_tensor_fn: Optional[
    callable[[pa.RecordBatch], Union[dict[str, torch.Tensor], torch.Tensor]]
] = None,

However, it is called here with an additional kwarg:

batch = self._to_tensor_fn(batch, hf_converter=self._hf_converter)