Closed athul-22 closed 3 months ago
Hi there,
I think this could be related to the NumPy 2.0 release yesterday night, which causes a lot of problems for lots of people everywhere since it makes breaking changes in various libraries. Until the NumPy team has a fix, could you try
pip install "numpy<2.0"
to see if it fixes the problem? In addition, I will also look into making updates specifically NumPy 2.0 in the next few days.
Thankyou sir, I tried
getting this output
Requirement already satisfied: numpy<2.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (1.26.4)
I think problem is something else ,, I tried Chatgpt , Claude ai , Gemini , still not working
Can you do a !pip list
in a code cell to see which package versions you are using?
As an idea, you could try this code for the assign()
function:
def assign(left, right):
if left.shape != right.shape:
raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}")
return torch.nn.Parameter(torch.tensor(right).clone().detach())
Thanks for reporting. And hm, that's odd, have you tried to run the chapter 5 notebook here exactly or did you make any modifications? I am asking because it working for me on both macOS and Linux.
Could you run this notebook to make sure all packages are up to date: https://github.com/rasbt/LLMs-from-scratch/blob/main/setup/02_installing-python-libraries/python_environment_check.ipynb
If you could try changing
def assign(left, right):
if left.shape != right.shape:
raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}")
return torch.nn.Parameter(torch.tensor(right))
to
def assign(left, right):
if left.shape != right.shape:
raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}")
return torch.nn.Parameter(torch.tensor(right).clone().detach()) # Changed
as suggested by @d-kleine above, let us know if it fixes it. If yes, I am happy to add this modification to the notebook and book.
The following packages need to be installed: numpy==1.26.0 matplotlib==3.8.2 jupyterlab==4.0.6 tensorflow==2.15.0 torch==2.2.1 tqdm==4.66.1 tiktoken==0.5.1
and getting this error
AttributeError Traceback (most recent call last) Cell In[176], line 64 60 gpt.final_norm.shift = assign(gpt.final_norm.shift, params["b"]) 61 gpt.out_head.weight = assign(gpt.out_head.weight, params["wte"]) ---> 64 load_weights_into_gpt(gpt, params) 65 gpt.to(device)
Cell In[176], line 11 7 for b in range(len(params["blocks"])): 8 q_w, k_w, v_w = np.split( 9 (params["blocks"][b]["attn"]["c_attn"])["w"], 3, axis=-1) 10 gpt.trf_blocks[b].att.W_query.weight = assign( ---> 11 gpt.trf_blocks[b].att.W_query.weight, q_w.T) 12 gpt.trf_blocks[b].att.W_key.weight = assign( 13 gpt.trf_blocks[b].att.W_key.weight, k_w.T) 14 gpt.trf_blocks[b].att.W_value.weight = assign( 15 gpt.trf_blocks[b].att.W_value.weight, v_w.T)
File ~/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1688, in Module.getattr(self, name) 1686 if name in modules: 1687 return modules[name] -> 1688 raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'MultiHeadAttention' object has no attribute ‘W_query'
@rasbt @d-kleine Thanks a lot ,, I got stucked here
The following packages need to be installed:
I recommend installing all the required packages by executing
pip install -r requirements.txt
in the main folder of this repository.
The second error looks like you maybe didn't run all the code from top to bottom in the notebook, but I am not 100% sure. Can you run it as follows and see if the issue disappears:
Yes, I tried installing it again, but I am getting Requirement already satisfied:
I tried to upload whole file to chatgpt and asked still getting error
Ill download and try your code
Thanks a lot
@rasbt I think it would be good to have issue templates, e.g. for bug reports, feature requests, etc. That would help to somewhat "standardize" issues by asking users questions like "what's your OS?", "which python are you using?", "have you checked that your packages version match with the ones provided in the requirements.txt in the root dir?", "how to reproduce the descriped bug?", "what you have you already tried to resolve the issue?", etc.
Yes, I agree. I added two templates. I don't want to overcomplicate it to keep it quick and easy to report an issue.
Looks good, thanks! 👍🏻 What do you think about adding a computing environment question, like
Where do you run your code?
- Locally (on my computer)
- Cloud (Google Colab, AWS, Azure, GCP, Lightning Studio 😉 )
Since some people had issues with Colab, yeah, that's a good idea.
Looks good, thanks! 👍🏻 What do you think about adding a computing environment question, like
Where do you run your code?
- Locally (on my computer)
- Cloud (Google Colab, AWS, Azure, GCP, Lightning Studio 😉 )
Locally (on my computer) Macos
Does the issue still persist when you ran the notebook from top to bottom?
https://github.com/athul-22/LLM-from-scratch/blob/MAIN/main.ipynb
THIS IS MY WHOLE REPO,, EVERYTHING IS WORKING EXCEPT THE LAST CODE. PLS HELP I TRIED MULTIPLE WAYS AND I STUCKED HERE
Does the issue still persist when you ran the notebook from top to bottom?
Yes sir , Only that cell
https://github.com/athul-22/LLM-from-scratch/blob/MAIN/main.ipynb
THIS IS MY WHOLE REPO,, EVERYTHING IS WORKING EXCEPT THE LAST CODE. PLS HELP I TRIED MULTIPLE WAYS AND I STUCKED HERE
Can't see code, please change the repo to be public (it seems like you have cloned the repo, it would have been better if you would have forked it in this case). And please pull the latest commits.
Sorry pls check now , Thanks ,, just that last part getting error , showing tensor size is not matching 256 and 1024
Thanks for sharing. It looks like there's lots of duplicated code in this file and it is a bit hard to see what's going on. I think the code section you provide should be fine if you executed the cells in order, but it looks like there was maybe a jump here:
The other issue I am seeing is that you replaced the Multihead attention module:
This is generally fine and you'd be able to train the model this way. But if you do that, you will have to update the weight loading code, because the names of the weight variables will be different. E.g., even if you fix the other issue in your code you will get another issue here:
Of course you could do it, but it will be lots of extra work, so I recommend using the MHA class I am using in the book.
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/ch05.ipynb