TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token'

juerware commented 3 months ago

This is the command line lauched:

python generate.py \
    --base_model=h2oai/h2ogpt-4096-llama2-7b-chat --prompt_type=llama2 \
    --use_gpu_id=True --gpu_id=-1 --max_seq_len=8192 \
    --user_path=/opt/myDocuments/arsys.es/all/ --langchain_mode='UserData' --max_quality=True \
    --add_chat_history_to_context=True --keep_sources_in_context=True --enable_ocr=True --enable_doctr=True \
    --answer_with_sources=True --show_link_in_sources=True --append_sources_to_chat=True \
    --hf_embedding_model="hkunlp/instructor-base" \
    --memory_restriction_level=0 --score_model=None --verbose=True --debug=True

This is the error found:

Traceback (most recent call last):
  File "/root/REPOSITORIES/aramirez/void_h2ogtp/generate.py", line 20, in <module>
    entrypoint_main()
  File "/root/REPOSITORIES/aramirez/void_h2ogtp/generate.py", line 16, in entrypoint_main
    H2O_Fire(main)
  File "/root/REPOSITORIES/aramirez/void_h2ogtp/src/utils.py", line 75, in H2O_Fire
    fire.Fire(component=component, command=args)
  File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/REPOSITORIES/aramirez/void_h2ogtp/src/gen.py", line 2015, in main
    model=get_embedding(use_openai_embedding, hf_embedding_model=hf_embedding_model,
  File "/root/REPOSITORIES/aramirez/void_h2ogtp/src/gpt_langchain.py", line 556, in get_embedding
    embedding = HuggingFaceInstructEmbeddings(model_name=hf_embedding_model,
  File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 167, in __init__
    self.client = INSTRUCTOR(
  File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 287, in __init__
    modules = self._load_sbert_model(
TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token'

System:

SO: Ubuntu 22.04 LTS utterly updated.
Commit: 7435b4bc (the last one at the moment this issue is written)
Environment utterly updated and clean after execution of script bash docs/linux_install_full.sh

Observations:
This only happens with embedding models based on BERT
Embedding models as "intfloat/multilingual-e5-small" does not report any problem.

Thanks for everything.

pseudotensor commented 3 months ago

It seems the deps are not installed that should be.

In requierments_optional_langchain.txt it has:

sentence_transformers>=3.0.1
InstructorEmbedding @ https://h2o-release.s3.amazonaws.com/h2ogpt/InstructorEmbedding-1.0.1-py3-none-any.whl
sentence_transformers_old @ https://h2o-release.s3.amazonaws.com/h2ogpt/sentence_transformers_old-2.2.2-py3-none-any.whl

and for me I see no failure because inside $HOME/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/InstructorEmbedding/instructor.py it references only:

from sentence_transformers_old import SentenceTransformer
from sentence_transformers_old.models import Transformer

Can you check your file and see what it shows? I presume not the "old" ones but normal. So explains failure.

However, doesn't explain why you have wrong packages. I just redid install and see these good packages used.

You should be able to do just:

from InstructorEmbedding import INSTRUCTOR

in python and it shouldn't fail. So not related to h2oGPT itself, just those two packages.

In clean docker I also see this works fine.

pseudotensor commented 3 months ago

677bf0817d3e342ffc32a31a94cf63cc05096660 794ec254460a0c38a2e3ae3e4437f5dc0f695a09

pseudotensor commented 3 months ago

Building new image with above fix to see if order helps. I saw in jenkins that earlier requirements.txt file triggered instructorembedding install, so needs to be early and not as late as langchain one.

pseudotensor commented 3 months ago

Seems to work. Thanks for reporting!

juerware commented 3 months ago

It´s working right now, thanks.

h2oai / h2ogpt

TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token' #1787

Environment utterly updated and clean after execution of script `bash docs/linux_install_full.sh`

Embedding models as "intfloat/multilingual-e5-small" does not report any problem.

h2oai / h2ogpt

TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token' #1787

Environment utterly updated and clean after execution of script bash docs/linux_install_full.sh

Embedding models as "intfloat/multilingual-e5-small" does not report any problem.

Environment utterly updated and clean after execution of script `bash docs/linux_install_full.sh`