Open bubl-ai opened 2 days ago
Yes, it is intended for llama-index-embeddings-huggingface
to take up a significant amount of space, potentially around ~12GB, due to the size of the models it uses.
To install a slimmer version inside a Docker container, you can opt for a smaller model. For example, you can use the jinaai/jina-embeddings-v2-small-en
model instead of a larger one. Here is how you can specify the smaller model in your setup:
# small model
model_name = "jinaai/jina-embeddings-v2-small-en"
# download model locally
embed_model = HuggingFaceEmbedding(
model_name=model_name, trust_remote_code=True
)
This should help reduce the overall size of the installation [1].
To continue talking to Dosu, mention @dosu.
@dosu that was not my question. My problem is not when loading a model, it is when installing it using pip install
To reduce the size of the llama-index-embeddings-huggingface
package when installing it via pip, you can use quantized embedding models optimized for Intel processors. This approach leverages the Optimum-Intel
library to load smaller and faster quantized models with minimal accuracy loss.
First, install the necessary dependencies:
pip install optimum[exporters] optimum-intel neural-compressor intel_extension_for_pytorch
Then, you can load and use the quantized models with the IntelEmbedding
class:
%pip install llama-index-embeddings-huggingface-optimum-intel
from llama_index.embeddings.huggingface_optimum_intel import IntelEmbedding
embed_model = IntelEmbedding("Intel/bge-small-en-v1.5-rag-int8-static")
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
Additionally, for a more selective installation to reduce the overall package size, you can install only the necessary components:
pip install llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface
This approach helps in reducing the package size and improving performance [1][2].
To continue talking to Dosu, mention @dosu.
Probably due to the torch dependencies? No way around that
I would use a docker image that has torch installed as the base. Or use multi-stage builds. Or limit yourself to one RUN in your dockerfile (all common strategies to reduce size)
@logan-markewich I already tried most of those approaches. The size just explodes anytime I include llama-index-embeddings-huggingface
in the equation.
I don't mind the build taking longer... what is important to me is the final size as I am pushing that image to a remote registry and using it there and the size just makes it difficult to work with.
I can't really suggest more than what I've mentioned above -- the only reason the image would be large is either due to installing torch (and associated cuda dependencies), which you can solve by using a PyTorch+CUDA base image, or by using multistage builds
Question Validation
Question
I am working inside a docker container with llama_index. I was noticing that mi image was significantly big, so today I took some time to check the reason for that.
My analysis shows that llama-index-embeddings-huggingface is taking ~12Gb so I am wondering if that is intended. What could I do to install a slimmer version of it?