h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.28k stars 1.24k forks source link

Feature request - to add support for nomic-ai/nomic-embed-text-v1 embeddings. #1418

Open slavag opened 7 months ago

slavag commented 7 months ago

Hi, Can you please add support for nomic-ai/nomic-embed-text-v1. Trying to run it as is, and getting next errors : H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'

And ValueError: Loading nomic-ai/nomic-embed-text-v1 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code=True to remove this error.

Thanks

pseudotensor commented 7 months ago

Probably bge m3 is better. Did you try that one? Also has long context and much smaller than other LLM based models.

pseudotensor commented 7 months ago

H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2' sounds like DocTR is not installed correctly.

pseudotensor commented 7 months ago

For nomic, it uses an unreleased sentence transformers 2.4.0dev. An attempt to pip install that dev version leads to issues failures in their API. So nothing is usable.

Once sentence transformers 2.4.0 is released without bugs, we can upgrade h2oGPT and pass the required option trust_remote_code, that option does not exist in prior sentence transformers.

slavag commented 7 months ago

@pseudotensor no, I didn't tried bge m3, will try. The usage is just to specify BAAI/bge-m3 or do I need to install anything ? As for DocTR, I checked everything according to their github, don't see anything that is missing, and yet getting : H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2' Thanks

slavag commented 7 months ago

@pseudotensor wanted to try bge, but it's failed (though regardless of the embeddings model). A few month ago that was working fine, when I did ingestion with another embeddings (same dataset)

python src/make_db.py --hf_embedding_model=BAAI/bge-m3 --chunk_size=8192 --user_path=/Users/slava/Documents/Development/private/ZendDeskTicketsNew -collection_name=ZenDeskTicketsWithDocsBGE
 59%|███████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                                         | 27/46 [00:07<00:05,  3.41it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 60%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                                       | 28/47 [00:08<00:05,  3.30it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 65%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                              | 35/54 [00:08<00:04,  4.09it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 66%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                            | 37/56 [00:08<00:04,  4.30it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 67%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                          | 39/58 [00:08<00:04,  4.53it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 69%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                      | 43/62 [00:08<00:03,  4.97it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 70%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                     | 44/63 [00:08<00:03,  5.05it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 72%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                 | 49/68 [00:08<00:03,  5.58it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 73%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                | 51/70 [00:08<00:03,  5.81it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 65%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                            | 3106/4776 [00:09<00:05, 329.55it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 63%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                               | 3107/4904 [00:09<00:05, 329.02it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 62%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                 | 3108/5032 [00:09<00:05, 328.77it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 60%|███████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                    | 3109/5160 [00:09<00:06, 328.64it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
 63%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                              | 4007/6312 [00:09<00:05, 418.69it/s]H2OOCRLoader: unknown architecture 'crnn_efficientnetv2_mV2'
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 41067/41067 [00:15<00:00, 2675.56it/s]
Exceptions: 0/92227 []
[1]    41576 killed     python3 src/make_db.py --hf_embedding_model=BAAI/bge-m3 --chunk_size=8192  
/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown                      
  warnings.warn('resource_tracker: There appear to be %d '
/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/joblib/externals/loky/backend/resource_tracker.py:314: UserWarning: resource_tracker: There appear to be 1 leaked folder objects to clean up at shutdown
  warnings.warn(
/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/joblib/externals/loky/backend/resource_tracker.py:330: UserWarning: resource_tracker: /var/folders/z1/qsct20p17nxdfhjlp29yr6r40000gn/T/joblib_memmapping_folder_41576_f980b37c109a4082a6f2c1759202b4c9_31ee9ea798ce4943b8c16dcd8596d055: FileNotFoundError(2, 'No such file or directory')
  warnings.warn(f"resource_tracker: {name}: {e!r}")

No other info is available, the process just terminated in the middle.

slavag commented 7 months ago

@pseudotensor Hi, can you please advise on the issue I mentioned above with make_db ? Thanks a lot !!!

pseudotensor commented 7 months ago

For docTR you can't use their repo, has to be installed from our fork as in the readme_linux.md or its linux_install.sh

The model being missing seems like the original DocTR repo is used.

slavag commented 7 months ago

@pseudotensor Thanks, What about make_db that crashes in the first minutes of execution, without too many info

Exceptions: 0/92227 []
[1]    41576 killed     python3 src/make_db.py --hf_embedding_model=BAAI/bge-m3 --chunk_size=8192  
/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown                      
  warnings.warn('resource_tracker: There appear to be %d '
/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/joblib/externals/loky/backend/resource_tracker.py:314: UserWarning: resource_tracker: There appear to be 1 leaked folder objects to clean up at shutdown
  warnings.warn(
/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/joblib/externals/loky/backend/resource_tracker.py:330: UserWarning: resource_tracker: /var/folders/z1/qsct20p17nxdfhjlp29yr6r40000gn/T/joblib_memmapping_folder_41576_f980b37c109a4082a6f2c1759202b4c9_31ee9ea798ce4943b8c16dcd8596d055: FileNotFoundError(2, 'No such file or directory')
  warnings.warn(f"resource_tracker: {name}: {e!r}")

Thanks.

pseudotensor commented 7 months ago

Looks like system OOM. You can check sudo dmesg -T to see if OOM Killer hit.

slavag commented 7 months ago

@pseudotensor indeed OOM, but don't know why this started to happen, the memory size of the process reached 86GB. When I have Mac with 32GB and swap. But in the past I was able to create db from those files (tried now default embeddings).

Please advise. Thanks

pseudotensor commented 7 months ago

Maybe one of the parsers went nuts, e.g. tesseract may have a bug. On gpt.h2o.ai I had one case when memory went up to the peak of 512GB.

Are you able to see from the verbose logging which document might have been an issue?

slavag commented 7 months ago

@pseudotensor No, I don't see , and now I have only text files (removed pdfs) and still got same issue.

slavag commented 7 months ago

@pseudotensor seems that this happens issue is after the parsing

.....
DONE Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket54810.txt
Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket72169.txt
DONE Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket72169.txt
Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket119236.txt
DONE Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket119236.txt
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 42124/42128 [00:26<00:00, 1642.83it/s]Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket73277.txt
DONE Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket73277.txt
Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket35222.txt
DONE Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket35222.txt
Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket87490.txt
DONE Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket87490.txt
Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket63064.txt
DONE Ingesting file: /Users/slava/Documents/Development/private/ZendDeskTickets/ticket63064.txt
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42128/42128 [00:26<00:00, 1576.36it/s]
0it [00:00, ?it/s]
END consuming path_or_paths=/Users/slava/Documents/Development/private/ZendDeskTickets url=None text=None
Exceptions: 0/498289 []
Loading and updating db
Found 498289 new sources (0 have no hash in original source, so have to reprocess for migration to sources with hash)
Removing 0 duplicate files from db because ingesting those as new documents
Existing db, adding to db_dir_ZenDeskTicketsWithDocsBGE
[1]    91113 killed     python3 src/make_db.py  -collection_name=ZenDeskTicketsWithDocsBGE  --n_jobs=
/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown                      
  warnings.warn('resource_tracker: There appear to be %d '
slavag commented 7 months ago

Tried to run make_db with memory profiler. First, I don't have PDFs but looks like somehow code uses doctr and other, and then I see huge allocation on transformers.xml_roberta. Have a look on the screenshot

image

Summary of allocations

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Location                                                                                                           ┃        <Total Memory> ┃        Total Memory % ┃            Own Memory ┃          Own Memory % ┃      Allocation Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ _PyEval_Vector at <unknown>                                                                                        │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677454 │
│ _run_tracker at /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/memray/commands/run.py            │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677474 │
│ run_path at <frozen runpy>                                                                                         │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677470 │
│ PyObject_Vectorcall at <unknown>                                                                                   │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677457 │
│ cfunction_vectorcall_FASTCALL_KEYWORDS at <unknown>                                                                │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677442 │
│ PyEval_EvalCode at <unknown>                                                                                       │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677434 │
│ builtin_exec at <unknown>                                                                                          │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677434 │
│ _run_code at <frozen runpy>                                                                                        │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677425 │
│ _run_module_code at <frozen runpy>                                                                                 │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677425 │
│ <module> at src/make_db.py                                                                                         │             170.146GB │               100.00% │                0.000B │                 0.00% │               1677423 │
│ _PyObject_MakeTpCall at <unknown>                                                                                  │             168.601GB │                99.09% │                0.000B │                 0.00% │               1288653 │
│ H2O_Fire at /Users/slava/Documents/Development/private/AI/h2ogpt/src/utils.py                                      │             137.791GB │                80.98% │                0.000B │                 0.00% │               1450699 │
│ Fire at /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/fire/core.py                              │             137.791GB │                80.98% │                0.000B │                 0.00% │               1450694 │
│ _Fire at /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/fire/core.py                             │             137.791GB │                80.98% │                0.000B │                 0.00% │               1450689 │
│ _CallAndUpdateTrace at /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/fire/core.py               │             137.791GB │                80.98% │                0.000B │                 0.00% │               1450688 │
│ make_db_main at src/make_db.py                                                                                     │             137.791GB │                80.98% │                0.000B │                 0.00% │               1450681 │
│ _PyVectorcall_Call at <unknown>                                                                                    │             137.671GB │                80.91% │                0.000B │                 0.00% │               1402343 │
│ method_vectorcall at <unknown>                                                                                     │             136.676GB │                80.33% │                0.000B │                 0.00% │               1254230 │
│ create_or_update_db at /Users/slava/Documents/Development/private/AI/h2ogpt/src/gpt_langchain.py                   │             136.557GB │                80.26% │                0.000B │                 0.00% │               1211409 │
│ get_db at /Users/slava/Documents/Development/private/AI/h2ogpt/src/gpt_langchain.py                                │             136.557GB │                80.26% │                0.000B │                 0.00% │               1211407 │
│ _PyObject_FastCallDictTstate at <unknown>                                                                          │             136.520GB │                80.24% │                0.000B │                 0.00% │               1219361 │
│ _PyObject_Call_Prepend at <unknown>                                                                                │             136.499GB │                80.22% │                0.000B │                 0.00% │               1199368 │
│ _PyObject_Call at <unknown>                                                                                        │             136.447GB │                80.19% │                0.000B │                 0.00% │               1174211 │
│ cfunction_call at <unknown>                                                                                        │             136.258GB │                80.08% │                0.000B │                 0.00% │                 77360 │
│ c10::DefaultCPUAllocator::allocate(unsigned long) const at <unknown>                                               │             136.205GB │                80.05% │                0.000B │                 0.00% │                   508 │
│ at::TensorBase at::detail::_empty_generic<long long>(c10::ArrayRef<long long>, c10::Allocator*,                    │             136.205GB │                80.05% │                0.000B │                 0.00% │                  1531 │
│ c10::DispatchKeySet, c10::ScalarType, std::__1::optional<c10::MemoryFormat>) at <unknown>                          │                       │                       │                       │                       │                       │
│ c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl>>            │             136.205GB │                80.05% │                0.000B │                 0.00% │                  1014 │
│ c10::intrusive_ptr<c10::StorageImpl,                                                                               │                       │                       │                       │                       │                       │
│ c10::detail::intrusive_target_default_null_type<c10::StorageImpl>>::make<c10::StorageImpl::use_byte_size_t,        │                       │                       │                       │                       │                       │
│ unsigned long&, c10::Allocator*&, bool>(c10::StorageImpl::use_byte_size_t&&, unsigned long&, c10::Allocator*&,     │                       │                       │                       │                       │                       │
│ bool&&) at <unknown>                                                                                               │                       │                       │                       │                       │                       │
│ c10::StorageImpl::StorageImpl(c10::StorageImpl::use_byte_size_t, c10::SymInt const&, c10::Allocator*, bool) at     │             136.205GB │                80.05% │                0.000B │                 0.00% │                   506 │
│ <unknown>                                                                                                          │                       │                       │                       │                       │                       │
│ add_to_db at /Users/slava/Documents/Development/private/AI/h2ogpt/src/gpt_langchain.py                             │             134.109GB │                78.82% │                0.000B │                 0.00% │                 37446 │
│ add_documents at /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain_core/vectorstores.py   │             134.104GB │                78.82% │                0.000B │                 0.00% │                 37402 │
│ add_texts at                                                                                                       │             134.102GB │                78.82% │                0.000B │                 0.00% │                 37399 │
│ /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain_community/vectorstores/chroma.py        │                       │                       │                       │                       │                       │
│ embed_documents at                                                                                                 │             134.093GB │                78.81% │                0.000B │                 0.00% │                 37552 │
│ /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain_community/embeddings/huggingface.py     │                       │                       │                       │                       │                       │
│ slot_tp_call at <unknown>                                                                                          │             134.010GB │                78.76% │                0.000B │                 0.00% │                  2414 │
│ encode at                                                                                                          │             134.007GB │                78.76% │                0.000B │                 0.00% │                   231 │
│ /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py      │                       │                       │                       │                       │                       │
│ _wrapped_call_impl at /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/torch/nn/modules/module.py  │             134.001GB │                78.76% │                0.000B │                 0.00% │                   125 │
│ forward at /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/torch/nn/modules/container.py          │             134.001GB │                78.76% │                0.000B │                 0.00% │                   125 │
│ _call_impl at /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/torch/nn/modules/module.py          │             134.001GB │                78.76% │                0.000B │                 0.00% │                   124 │
│ forward at                                                                                                         │             134.001GB │                78.76% │                0.000B │                 0.00% │                   112 │
│ /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/sentence_transformers/models/Transformer.py       │                       │                       │                       │                       │                       │
│ forward at                                                                                                         │             134.001GB │                78.76% │                0.000B │                 0.00% │                   111 │
│ /Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_rob… │                       │                       │                       │                       │                       │
│ torch::autograd::THPVariable_matmul(_object*, _object*, _object*) at <unknown>                                     │             130.000GB │                76.40% │                0.000B │                 0.00% │                    22 │
│ at::native::_matmul_impl(at::Tensor&, at::Tensor const&, at::Tensor const&) at <unknown>                           │             130.000GB │                76.40% │                0.000B │                 0.00% │                    20 │
│ at::native::matmul(at::Tensor const&, at::Tensor const&) at <unknown>                                              │             130.000GB │                76.40% │                0.000B │                 0.00% │                    20 │
│ at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) at <unknown>                                          │             130.000GB │                76.40% │                0.000B │                 0.00% │                    20 │
│ c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPoint… │             128.000GB │                75.23% │                0.000B │                 0.00% │                    12 │
│ (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CPU_bmm(at::Tensor const&, at::Tensor  │                       │                       │                       │                       │                       │
│ const&)>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&>>, at::Tensor (at::Tensor │                       │                       │                       │                       │                       │
│ const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) │                       │                       │                       │                       │                       │
│ at <unknown>                                                                                                       │                       │                       │                       │                       │                       │
│ c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPoint… │             128.000GB │                75.23% │                0.000B │                 0.00% │                    12 │
│ (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous            │                       │                       │                       │                       │                       │
│ namespace)::bmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>, at::Tensor,                           │                       │                       │                       │                       │                       │
│ c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&>>, at::Tensor              │                       │                       │                       │                       │                       │
│ (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet,      │                       │                       │                       │                       │                       │
│ at::Tensor const&, at::Tensor const&) at <unknown>                                                                 │                       │                       │                       │                       │                       │
│ at::_ops::bmm::call(at::Tensor const&, at::Tensor const&) at <unknown>                                             │             128.000GB │                75.23% │                0.000B │                 0.00% │                    12 │
│ at::(anonymous namespace)::structured_bmm_out_cpu_functional::set_output_raw_strided(long long, c10::ArrayRef<long │             128.000GB │                75.23% │                0.000B │                 0.00% │                     4 │
│ long>, c10::ArrayRef<long long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>) at <unknown>                      │                       │                       │                       │                       │                       │
│ void at::meta::common_checks_baddbmm_bmm<at::meta::structured_bmm>(at::meta::structured_bmm&, at::Tensor const&,   │             128.000GB │                75.23% │                0.000B │                 0.00% │                     4 │
│ at::Tensor const&, c10::Scalar const&, c10::Scalar const&, bool, std::__1::optional<at::Tensor> const&) at         │                       │                       │                       │                       │                       │
│ <unknown>                                                                                                          │                       │                       │                       │                       │                       │
│ at::meta::structured_bmm::meta(at::Tensor const&, at::Tensor const&) at <unknown>                                  │             128.000GB │                75.23% │                0.000B │                 0.00% │                     4 │
│ _find_and_load at <frozen importlib._bootstrap>                                                                    │              32.409GB │                19.05% │                0.000B │                 0.00% │                241978 │
│ _find_and_load_unlocked at <frozen importlib._bootstrap>                                                           │              32.409GB │                19.05% │                0.000B │                 0.00% │                241976 │
│ _load_unlocked at <frozen importlib._bootstrap>                                                                    │              32.409GB │                19.05% │                0.000B │                 0.00% │                241770 │
│ exec_module at <frozen importlib._bootstrap_external>                                                              │              32.409GB │                19.05% │                0.000B │                 0.00% │                241768 │
│ object_vacall at <unknown>                                                                                         │              32.406GB │                19.05% │                0.000B │                 0.00% │                239971 │
│ PyObject_CallMethodObjArgs at <unknown>                                                                            │              32.406GB │                19.05% │                0.000B │                 0.00% │                239971 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────┴───────────────────────┴───────────────────────┴───────────────────────┴───────────────────────┘
🥇 Top 5 largest allocating locations (by size):
    - forward:/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/transformers/models/xlm_roberta/modeling_xlm_roberta.py:237 -> 130.002GB
    - init_lib:/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/pypdfium2/_library_scope.py:25 -> 32.000GB
    - hash_file:/Users/slava/Documents/Development/private/AI/h2ogpt/src/utils.py:1124 -> 5.313GB
    - filter:/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/fnmatch.py:56 -> 4.468GB
    - load_file:/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/safetensors/torch.py:308 -> 4.231GB

Can you please to try to create any large db with default embedding model and check if it's working on your end ? Thanks

pseudotensor commented 7 months ago

Nice tool. It looks as if the chroma team changed something. Maybe the batching size is larger and they send arbitrarily large batch to the embedding model.

Can you add some prints to this code?

https://github.com/h2oai/h2ogpt/blob/190310d4df3db457b481759f90a346591eaf6491/src/gpt_langchain.py#L318-L346

Specifically, the max_batch_size can be printed. Maybe it's crazy large.

slavag commented 7 months ago

@pseudotensor Max batch size 83333, coming from max_batch_size attr.

pseudotensor commented 7 months ago

Try the latest changes. 83333 is very large. I made the max 4096. Or you can control via env CHROMA_MAX_BATCH_SIZE

slavag commented 7 months ago

@pseudotensor much better, thanks. Btw, bge m3 on MacBook Pro 32GB M1Max, batch size > 4 is failing. not enough memory. On Linux with NVidia nvidia a10g (24GB) with 4096 also failed, working with 1 or 2.

Also, maybe it's a good idea to add device to make_db, as on mac it can use metal.

pseudotensor commented 7 months ago

4096 is on high end, yes can make it smaller as required. If on CPU I expect should work pretty ok, but issue is bge-m3 has 8k context so uses alot more memory despite size if chunks are large.

I think issue is that for summarization purposes, we double the chunks, and there's no limit to their size, so that might be hitting the bge-m3 model hard since it'll take 8k.

One will have to tell the model to truncate at (say) smaller token counts or (yes) limit batch size.