load_weights_into_gpt Getting error Ch : 5

athul-22 commented 3 months ago

https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/ch05.ipynb

import numpy as np

def load_weights_into_gpt(gpt, params):
    gpt.pos_emb.weight = assign(gpt.pos_emb.weight, params['wpe'])
    gpt.tok_emb.weight = assign(gpt.tok_emb.weight, params['wte'])

    for b in range(len(params["blocks"])):
        q_w, k_w, v_w = np.split(
            (params["blocks"][b]["attn"]["c_attn"])["w"], 3, axis=-1)
        gpt.trf_blocks[b].att.W_query.weight = assign(
            gpt.trf_blocks[b].att.W_query.weight, q_w.T)
        gpt.trf_blocks[b].att.W_key.weight = assign(
            gpt.trf_blocks[b].att.W_key.weight, k_w.T)
        gpt.trf_blocks[b].att.W_value.weight = assign(
            gpt.trf_blocks[b].att.W_value.weight, v_w.T)

        q_b, k_b, v_b = np.split(
            (params["blocks"][b]["attn"]["c_attn"])["b"], 3, axis=-1)
        gpt.trf_blocks[b].att.W_query.bias = assign(
            gpt.trf_blocks[b].att.W_query.bias, q_b)
        gpt.trf_blocks[b].att.W_key.bias = assign(
            gpt.trf_blocks[b].att.W_key.bias, k_b)
        gpt.trf_blocks[b].att.W_value.bias = assign(
            gpt.trf_blocks[b].att.W_value.bias, v_b)

        gpt.trf_blocks[b].att.out_proj.weight = assign(
            gpt.trf_blocks[b].att.out_proj.weight, 
            params["blocks"][b]["attn"]["c_proj"]["w"].T)
        gpt.trf_blocks[b].att.out_proj.bias = assign(
            gpt.trf_blocks[b].att.out_proj.bias, 
            params["blocks"][b]["attn"]["c_proj"]["b"])

        gpt.trf_blocks[b].ff.layers[0].weight = assign(
            gpt.trf_blocks[b].ff.layers[0].weight, 
            params["blocks"][b]["mlp"]["c_fc"]["w"].T)
        gpt.trf_blocks[b].ff.layers[0].bias = assign(
            gpt.trf_blocks[b].ff.layers[0].bias, 
            params["blocks"][b]["mlp"]["c_fc"]["b"])
        gpt.trf_blocks[b].ff.layers[2].weight = assign(
            gpt.trf_blocks[b].ff.layers[2].weight, 
            params["blocks"][b]["mlp"]["c_proj"]["w"].T)
        gpt.trf_blocks[b].ff.layers[2].bias = assign(
            gpt.trf_blocks[b].ff.layers[2].bias, 
            params["blocks"][b]["mlp"]["c_proj"]["b"])

        gpt.trf_blocks[b].norm1.scale = assign(
            gpt.trf_blocks[b].norm1.scale, 
            params["blocks"][b]["ln_1"]["g"])
        gpt.trf_blocks[b].norm1.shift = assign(
            gpt.trf_blocks[b].norm1.shift, 
            params["blocks"][b]["ln_1"]["b"])
        gpt.trf_blocks[b].norm2.scale = assign(
            gpt.trf_blocks[b].norm2.scale, 
            params["blocks"][b]["ln_2"]["g"])
        gpt.trf_blocks[b].norm2.shift = assign(
            gpt.trf_blocks[b].norm2.shift, 
            params["blocks"][b]["ln_2"]["b"])

    gpt.final_norm.scale = assign(gpt.final_norm.scale, params["g"])
    gpt.final_norm.shift = assign(gpt.final_norm.shift, params["b"])
    gpt.out_head.weight = assign(gpt.out_head.weight, params["wte"])

load_weights_into_gpt(gpt, params)
gpt.to(device);

Error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[232], [line 64](vscode-notebook-cell:?execution_count=232&line=64)
     [60](vscode-notebook-cell:?execution_count=232&line=60)     gpt.final_norm.shift = assign(gpt.final_norm.shift, params["b"])
     [61](vscode-notebook-cell:?execution_count=232&line=61)     gpt.out_head.weight = assign(gpt.out_head.weight, params["wte"])
---> [64](vscode-notebook-cell:?execution_count=232&line=64) load_weights_into_gpt(gpt, params)
     [65](vscode-notebook-cell:?execution_count=232&line=65) gpt.to(device)

Cell In[232], [line 4](vscode-notebook-cell:?execution_count=232&line=4)
      [3](vscode-notebook-cell:?execution_count=232&line=3) def load_weights_into_gpt(gpt, params):
----> [4](vscode-notebook-cell:?execution_count=232&line=4)     gpt.pos_emb.weight = assign(gpt.pos_emb.weight, params['wpe'])
      [5](vscode-notebook-cell:?execution_count=232&line=5)     gpt.tok_emb.weight = assign(gpt.tok_emb.weight, params['wte'])
      [7](vscode-notebook-cell:?execution_count=232&line=7)     for b in range(len(params["blocks"])):

TypeError: 'ellipsis' object is not subscriptable

rasbt commented 3 months ago

Hi there,

I think this could be related to the NumPy 2.0 release yesterday night, which causes a lot of problems for lots of people everywhere since it makes breaking changes in various libraries. Until the NumPy team has a fix, could you try

pip install "numpy<2.0"

to see if it fixes the problem? In addition, I will also look into making updates specifically NumPy 2.0 in the next few days.

athul-22 commented 3 months ago

Thankyou sir, I tried

getting this output

Requirement already satisfied: numpy<2.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (1.26.4)

I think problem is something else ,, I tried Chatgpt , Claude ai , Gemini , still not working

d-kleine commented 3 months ago

Can you do a !pip list in a code cell to see which package versions you are using?

As an idea, you could try this code for the assign() function:

def assign(left, right):
    if left.shape != right.shape:
        raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}")
    return torch.nn.Parameter(torch.tensor(right).clone().detach())

rasbt commented 3 months ago

Thanks for reporting. And hm, that's odd, have you tried to run the chapter 5 notebook here exactly or did you make any modifications? I am asking because it working for me on both macOS and Linux.

Could you run this notebook to make sure all packages are up to date: https://github.com/rasbt/LLMs-from-scratch/blob/main/setup/02_installing-python-libraries/python_environment_check.ipynb

rasbt commented 3 months ago

If you could try changing

def assign(left, right):
    if left.shape != right.shape:
        raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}")
    return torch.nn.Parameter(torch.tensor(right))

to

def assign(left, right):
    if left.shape != right.shape:
        raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}")
    return torch.nn.Parameter(torch.tensor(right).clone().detach()) # Changed

as suggested by @d-kleine above, let us know if it fixes it. If yes, I am happy to add this modification to the notebook and book.

athul-22 commented 3 months ago

The following packages need to be installed: numpy==1.26.0 matplotlib==3.8.2 jupyterlab==4.0.6 tensorflow==2.15.0 torch==2.2.1 tqdm==4.66.1 tiktoken==0.5.1

and getting this error

AttributeError Traceback (most recent call last) Cell In[176], line 64 60 gpt.final_norm.shift = assign(gpt.final_norm.shift, params["b"]) 61 gpt.out_head.weight = assign(gpt.out_head.weight, params["wte"]) ---> 64 load_weights_into_gpt(gpt, params) 65 gpt.to(device)

Cell In[176], line 11 7 for b in range(len(params["blocks"])): 8 q_w, k_w, v_w = np.split( 9 (params["blocks"][b]["attn"]["c_attn"])["w"], 3, axis=-1) 10 gpt.trf_blocks[b].att.W_query.weight = assign( ---> 11 gpt.trf_blocks[b].att.W_query.weight, q_w.T) 12 gpt.trf_blocks[b].att.W_key.weight = assign( 13 gpt.trf_blocks[b].att.W_key.weight, k_w.T) 14 gpt.trf_blocks[b].att.W_value.weight = assign( 15 gpt.trf_blocks[b].att.W_value.weight, v_w.T)

File ~/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1688, in Module.getattr(self, name) 1686 if name in modules: 1687 return modules[name] -> 1688 raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")

AttributeError: 'MultiHeadAttention' object has no attribute ‘W_query'

@rasbt @d-kleine Thanks a lot ,, I got stucked here

rasbt commented 3 months ago

The following packages need to be installed:

I recommend installing all the required packages by executing

pip install -r requirements.txt

in the main folder of this repository.

The second error looks like you maybe didn't run all the code from top to bottom in the notebook, but I am not 100% sure. Can you run it as follows and see if the issue disappears:

athul-22 commented 3 months ago

Yes, I tried installing it again, but I am getting Requirement already satisfied:

I tried to upload whole file to chatgpt and asked still getting error

Ill download and try your code

Thanks a lot

d-kleine commented 3 months ago

@rasbt I think it would be good to have issue templates, e.g. for bug reports, feature requests, etc. That would help to somewhat "standardize" issues by asking users questions like "what's your OS?", "which python are you using?", "have you checked that your packages version match with the ones provided in the requirements.txt in the root dir?", "how to reproduce the descriped bug?", "what you have you already tried to resolve the issue?", etc.

rasbt commented 3 months ago

Yes, I agree. I added two templates. I don't want to overcomplicate it to keep it quick and easy to report an issue.

d-kleine commented 3 months ago

Looks good, thanks! 👍🏻 What do you think about adding a computing environment question, like

Where do you run your code?

Locally (on my computer)

Cloud (Google Colab, AWS, Azure, GCP, Lightning Studio 😉 )

rasbt commented 3 months ago

Since some people had issues with Colab, yeah, that's a good idea.

athul-22 commented 3 months ago

Looks good, thanks! 👍🏻 What do you think about adding a computing environment question, like

Where do you run your code?

Locally (on my computer)

Cloud (Google Colab, AWS, Azure, GCP, Lightning Studio 😉 )

Locally (on my computer) Macos

rasbt commented 3 months ago

Does the issue still persist when you ran the notebook from top to bottom?

athul-22 commented 3 months ago

https://github.com/athul-22/LLM-from-scratch/blob/MAIN/main.ipynb

THIS IS MY WHOLE REPO,, EVERYTHING IS WORKING EXCEPT THE LAST CODE. PLS HELP I TRIED MULTIPLE WAYS AND I STUCKED HERE

athul-22 commented 3 months ago

Does the issue still persist when you ran the notebook from top to bottom?

Yes sir , Only that cell

d-kleine commented 3 months ago

https://github.com/athul-22/LLM-from-scratch/blob/MAIN/main.ipynb

THIS IS MY WHOLE REPO,, EVERYTHING IS WORKING EXCEPT THE LAST CODE. PLS HELP I TRIED MULTIPLE WAYS AND I STUCKED HERE

Can't see code, please change the repo to be public (it seems like you have cloned the repo, it would have been better if you would have forked it in this case). And please pull the latest commits.

athul-22 commented 3 months ago

Sorry pls check now , Thanks ,, just that last part getting error , showing tensor size is not matching 256 and 1024

rasbt commented 3 months ago

Thanks for sharing. It looks like there's lots of duplicated code in this file and it is a bit hard to see what's going on. I think the code section you provide should be fine if you executed the cells in order, but it looks like there was maybe a jump here:

The other issue I am seeing is that you replaced the Multihead attention module:

This is generally fine and you'd be able to train the model this way. But if you do that, you will have to update the weight loading code, because the names of the weight variables will be different. E.g., even if you fix the other issue in your code you will get another issue here:

Of course you could do it, but it will be lots of extra work, so I recommend using the MHA class I am using in the book.

rasbt / LLMs-from-scratch

load_weights_into_gpt Getting error Ch : 5 #215