onlyphantom / llm-python

Large Language Models (LLMs) tutorials & sample scripts, ft. langchain, openai, llamaindex, gpt, chromadb & pinecone
https://www.youtube.com/playlist?list=PLXsFtK46HZxUQERRbOmuGoqbMD-KWLkOS
MIT License
673 stars 264 forks source link

Query execution hangs for 07_custom.py #4

Open aschroede opened 1 year ago

aschroede commented 1 year ago

Hi there, I was trying to get the 07_custom.py program to run with the facebook/opt-iml-1.3b model and I can see it loads the cache correctly and I put in enough print statements to see that it also was able to get the LLMPredictor, create the service context, and load the index from disk. However when it tries to call execute_query the program seemingly hangs. I can see my RAM usage spike for an extended period of time but no matter how long I wait (20 minutes?) I don't get a response from the model. Note that I am running with an AMD GPU so when creating the pipeline I removed the CUDA device specification because as far as I can tell CUDA Is not supported with AMD GPUs. Do I need a more powerful computer or CUDA to run this?

Here are my specifications:

OS: Windows 11 Processor AMD Ryzen 7 5800H with Radeon Graphics 3.20 GHz Installed RAM 16.0 GB (13.9 GB usable) Device ID XXXXXXXXXXXXXX Product ID 00342-20715-34612-AAOEM System type 64-bit operating system, x64-based processor GPU 0: AMD radeon RX 6600M GPU1: AMD Radeon(TM) Graphcis Pen and touch Pen support

Thanks for your help!

asharda commented 1 year ago

I'm having similar issues but on MacBook pro M1. It took 2808 seconds to rerun query results. Here is output: Loading local cache of model About to load from disk! About to Execute query! INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 8165 tokens INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens [2804.93106914 seconds]: f([]) ->

System Specifications: MacBook Pro - Apple M1 Pro - 16 GB - 240 GB disk storage free. I also checked Activity monitor during the application execution- System 5%, User 20% idle - 75%

onlyphantom commented 1 year ago

Thank you both for opening the issue. I’m getting help from some of my team members who have similar hardware specs to try and isolate the issue a bit further to see if I can reproduce.

That 2804 seconds is inexplicable even if you take away CUDA optimizations 😭

loneovais2020 commented 1 year ago

TypeError: CallbackManager.configure() got an unexpected keyword argument 'inheritable_tags'