meta-llama / codellama

Inference code for CodeLlama models
Other
15.96k stars 1.86k forks source link

Running CodeLllama-13B on single GPU #77

Open manoj21192 opened 1 year ago

manoj21192 commented 1 year ago

In the ReadMe file, it is mentioned that to run 13B model, MP value should be 2. I have only 1 GPU, is there a way to run this model on single GPU (I am fine if efficiency is lost, what I care as of now is to run the 13B model)

GaganHonor commented 1 year ago
accumulation_steps = 2  # Number of batches to accumulate gradients

for batch_index, batch in enumerate(data_loader):
    # Forward pass and compute loss
    loss = model.forward(batch)

    # Scale the loss by the accumulation_steps
    loss = loss / accumulation_steps

    # Backward pass and compute gradients
    loss.backward()

    if (batch_index + 1) % accumulation_steps == 0:
        # Update the model's parameters
        optimizer.step()
        model.zero_grad()

By using gradient accumulation, you can effectively simulate a larger batch size while training on a single GPU.

manoj21192 commented 1 year ago

@GaganHonor : Few doubts though I understand the above code 1) Since I have only 1 GPU, do I need to set accumulation_steps = 1 for 13B model whose MP value=2 2) Above changes needs to be done in which file ?

GaganHonor commented 1 year ago

@GaganHonor : Few doubts though I understand the above code

  1. Since I have only 1 GPU, do I need to set accumulation_steps = 1 for 13B model whose MP value=2
  2. Above changes needs to be done in which file ?

Try Test Try ( personal opinion ) and its in config

manoj21192 commented 1 year ago

@GaganHonor : Few doubts though I understand the above code

  1. Since I have only 1 GPU, do I need to set accumulation_steps = 1 for 13B model whose MP value=2
  2. Above changes needs to be done in which file ?

Try Test Try ( personal opinion ) and its in config

I am sorry but I am unable to find any .py file which contains above code. I have cloned the repository from github, could you please let me know the name of python file where this code is present? I cant find any config.py file. I know above code must be running somewhere in backend but I am unable to locate this code to make changes to run 13B model on single GPU

GaganHonor commented 1 year ago

SOURCE : 13B MODEL GENERATED THIS ANSWER FOR YOU 💀 If you are unable to locate the accumulation_steps variable in the codebase, you can try the following steps to find it:

Search for the variable: Use your text editor's search functionality to search for the variable accumulation_steps within the codebase. This will help you locate where it is defined and used.

Check related files: Look for files or modules that are related to model training or optimization. Common names for such files include train.py, model.py, or files that contain functions related to training or optimization.

Look for model training loop: The accumulation_steps variable is typically used within a loop that iterates over the dataset batches for training. Look for a loop that iterates over the data loader or dataset and performs forward pass, loss computation, backward pass, and parameter update steps.

Consult documentation or code comments: If you are working with a codebase that has documentation or code comments, check if there are any references or explanations regarding the usage of accumulation_steps.

Seek assistance from the code author or community: If you are still unable to find the accumulation_steps variable, consider reaching out to the code author or the community associated with the codebase. They may be able to provide specific guidance or point you to the relevant code section.

Remember to adapt the modifications to the appropriate location once you find the accumulation_steps variable in the codebase.

geromepamintuan commented 1 year ago

https://download2.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoicG1jeTAzeW9qYXYzOHdodGtzYXRjaWhwIiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQyLmxsYW1hbWV0YS5uZXRcLyoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2OTQwNzY5MDB9fX1dfQ__&Signature=c0CaG3Ph%7EGra7rdugunQaTLGh9d2MjpcUsg%7E7gNLeMuz94p%7EYeC4wKHC0nWM-S5SLaXCNP85cGavjI1VDvpCrtdKHhDWifaVJuJYr1XrU1oP1aSlMw0auEfO2ZLxQ2IgIwaKgcrcgwWrUvylJyThEQCUQNVqk5fp466hHj%7EfM%7EG1AbXFrsgh5LNw3m81zkCeloWC7isnSGwqUpSofUrQVFdsPRab55dIsMxTiX9r3gtpRnb9hbN%7E7YHFwI2I4hAg51iFASEqbpQP8p9ckzEaYupO93Ico8CCXS%7EQpxqcF860LxYgAgYL%7EPur8E9Msez0P30bFF8RVttCLL9D7O7wCA__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=208515475329671

manoj21192 commented 1 year ago

https://download2.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoicG1jeTAzeW9qYXYzOHdodGtzYXRjaWhwIiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQyLmxsYW1hbWV0YS5uZXRcLyoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2OTQwNzY5MDB9fX1dfQ__&Signature=c0CaG3Ph%7EGra7rdugunQaTLGh9d2MjpcUsg%7E7gNLeMuz94p%7EYeC4wKHC0nWM-S5SLaXCNP85cGavjI1VDvpCrtdKHhDWifaVJuJYr1XrU1oP1aSlMw0auEfO2ZLxQ2IgIwaKgcrcgwWrUvylJyThEQCUQNVqk5fp466hHj%7EfM%7EG1AbXFrsgh5LNw3m81zkCeloWC7isnSGwqUpSofUrQVFdsPRab55dIsMxTiX9r3gtpRnb9hbN%7E7YHFwI2I4hAg51iFASEqbpQP8p9ckzEaYupO93Ico8CCXS%7EQpxqcF860LxYgAgYL%7EPur8E9Msez0P30bFF8RVttCLL9D7O7wCA__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=208515475329671

I have already downloaded all the models, didnt understand how that's going to resolve my query

DyeKuu commented 1 year ago

You may wanna take a look at https://github.com/facebookresearch/codellama/issues/82 for quantization if the usecase is inference only.

If you could run batch_size = 1, then as discussed above https://github.com/facebookresearch/codellama/issues/77#issuecomment-1703664262, the gradient accumulation could help you simulate a large batch size training.

If you could not even run batch_size = 1, then the only way I could think of is to do CPU-offloading (a pretty naive way of pipeline parallelism) for partial model but I presume it requires you to do a lot of heavy-lifting work.

manoj21192 commented 1 year ago

You may wanna take a look at #82 for quantization if the usecase is inference only.

If you could run batch_size = 1, then as discussed above #77 (comment), the gradient accumulation could help you simulate a large batch size training.

If you could not even run batch_size = 1, then the only way I could think of is to do CPU-offloading (a pretty naive way of pipeline parallelism) for partial model but I presume it requires you to do a lot of heavy-lifting work.

I am unable to find the file where code for gradient accumulation is written can you tell me the name of file?

DragonAngel1st commented 1 year ago

I'm not too sure what you guys/girls are trying to explain with your solutions, but should we not be able to run the CodeLlama-13b-Instruct with a NVIDIA RTX 4090 with 24mb of GPU ram? The model should fit in it's memory. Also GH, your are talking about the config file for PyTorch framework and not the CodeLlama codebase from this git repository. Why would we want to change it in the PyTorch library files, is there not a way to configure this with a missing parameter in one of the Llama.build() function?

From my point of view GH, you seem to know a lot of torch stuff, but have your tried the default example_instructions.py file by running it with the torchrun wrapper? That's the program we are trying to run and modify not the default in the frameworks.