h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures
https://datatable.readthedocs.io
Mozilla Public License 2.0
1.81k stars 155 forks source link

Consider adding support for embeddings #3462

Open st-pasha opened 1 year ago

st-pasha commented 1 year ago

The LLMs are in fashion now, so why not add a support for them?

oleksiyskononenko commented 1 year ago

Can’t the existing array type be used for that?

st-pasha commented 1 year ago

You'd want an array of fixed size, kind of like a mathematical vector. It might be pretty similar to the existing array type in terms of implementation, though.

oleksiyskononenko commented 1 year ago

Yeah, but it probably could be array[float, N] type for fixed lengths vectors and just array[float] for arbitrary length.

oleksiyskononenko commented 1 year ago

I'm not an expert on LLMs, could you please point me to some information on the data types that they're using?

st-pasha commented 1 year ago

At the most basic level, an embedding is just a vector of floats of a fixed length. For example, see here: https://www.pinecone.io/learn/vector-database/