jncraton / languagemodels

Explore large language models in 512MB of RAM
https://jncraton.github.io/languagemodels/
MIT License
1.18k stars 78 forks source link

lm.extract_answer method is failing #26

Closed TheChoicists closed 1 year ago

TheChoicists commented 1 year ago

Hello Team,

When I execute lm.store_doc(), the code runs to completion and the get_model method has no issues. When I execute lm.extract_answer(), I noticed that the get_model() method, unlike in the case above, executes the lines of python below:

` elif not tokenizer_only:

Make sure the model is reloaded if we've unloaded it

    try:
        modelcache[model_name][1].load_model()
    except AttributeError:
        # Encoder-only models can't be unloaded in ctranslate2
        pass`

This then results in the following error being returned back

Exception has occurred: AttributeError 'NoneType' object has no attribute 'generate_batch' File "C:\Users\ejmar\Documents\Eternev\ai_agent\languagemodels\languagemodels\inference.py", line 154, in generate_instruct results = model.generate_batch( File "C:\Users\ejmar\Documents\Eternev\ai_agent\languagemodels\languagemodels__init__.py", line 189, in extract_answer return generate_instruct(f"{context}\n\n{question}") File "C:\Users\ejmar\Documents\Eternev\ai_agent\languagemodels\test.py", line 9, in answer = lm.extract_answer(question=prompt, context=context) AttributeError: 'NoneType' object has no attribute 'generate_batch'

I cloned your repo and ran pip install -r requirements.txt

Would you happen to know what the issue is? I have been trying to debug for the past couple of hours and all I can see is that the get_models method is returning None for the model, which I assume is the issue.

I am working within the test.py file and I have included the project folder below. I would be super appreciative if you could help me debug since I am a huge fan of this project!

languagemodels.zip

TheChoicists commented 1 year ago

I also want to add that when I run the the assistant.py file within the examples directory, that it works as expected but I need to run pip install languagemodels, which I believe means that it is using the files stored within the site-packages directory and those are the same files that are within the languagemodels directory that are within the zip I attached, which confuses me.

TheChoicists commented 1 year ago

Also, when I copied the files from the site-packages directory over into my languagemodels folder, I can now execute lm.do or lm.extract_answer without issue which confuses me but then again it also late image

jncraton commented 1 year ago

Thanks for letting me know about this. It looks like the main branch may not have been in a good state. I've merged several fixes from the dev branch just now, so hopefully these problems will be resolved for you. Let me know if you are still experiencing these issues after pulling down the updated code.

TheChoicists commented 1 year ago

Thank you for the quick turnaround! I tried cloning the repo today and when I tried to execute the assistant.py I got the following error: image

Is this expected?

jncraton commented 1 year ago

That is not expected. It looks like the backend is detecting that CUDA is available via cuBLAS, but it isn't being initialized properly. I don't have a Windows environment handy at the moment, so I don't have an easy way to troubleshoot this with you, unfortunately.

I suspect that removing device="auto" from initialize_model in models.py will resolve this for you (by disabling CUDA).

jncraton commented 1 year ago

I've changed the default behavior to use the CPU unless CUDA detection is requested by setting lm.config["device"] = "auto". This should hopefully resolve issues like this.

TheChoicists commented 1 year ago

Still not working but I think it is specific to my environment, so it is my burden. Thank you so much! Are there any plans to implement larger models within this project? :)

TheChoicists commented 1 year ago

I noticed that the library is saying that cublas64_11.dll is not found or cannot be loaded but I checked my path variables and found that I do have cublas64_12.dll installed instead. image

You had mentioned earlier that I resolve this by removing auto from the initialize_model method and that solved my issue! I really appreciate all the help so far! I am loving this project so much, learning more and more each day! image

I just wanted to quickly resolve this so I can begin to play around with lm.code!

Still interested in knowing if there are plans for supporting larger models or if there was a templated way to integrate larger models into this project?

jncraton commented 1 year ago

You had mentioned earlier that I resolve this by removing auto from the initialize_model method but what exactly would I need to change?

I recently changed this behavior in the most recent release. If you pull down the most recent version it should now default to using your CPU for inference.

Are there any plans to implement larger models within this project? :)

I'm not opposed to supporting larger models, but they're not a priority for me personally. I'm using this in a classroom where not everyone will have access to a powerful NVIDIA GPU, and larger models like Llama-2 are still painfully slow on CPU.

TheChoicists commented 1 year ago

Got it! That makes sense, I tried pulling in the latest changes and we are all systems go! As for supporting future models, that makes sense. I am currently unblocked! Thank you for all of your help!