Running llama-2-7b timeout in Google Colab

alucard001 commented 1 year ago

Here is the Gist: https://gist.github.com/alucard001/ed115328a82865961d020d46387cfd47

As you can see, after installing Pytorch and run the example command, it runs for 3:30 and the child process is stopped.

GPU version is attached in Gist for reference.

Is it the memory problem? Or any other insight is appreciated.

Thank you very much in advance for FBR great work.

attaullah commented 1 year ago

Same issue here.

phosseini commented 1 year ago

I could run it on Google Colab Pro+ with High-memory and A100 GPU but it's as you see pretty slow:

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 401.99 seconds
I believe the meaning of life is
> to be happy. I believe we are all born with the potential to be happy. The meaning of life is to be happy, but the way to get there is not always easy.
The meaning of life is to be happy. It is not always easy to be happy, but it is possible. I believe that

==================================

Simply put, the theory of relativity states that 
> 1) time, space, and mass are relative, and 2) the speed of light is constant, regardless of the relative motion of the observer.
Let’s look at the first point first.
Relative Time and Space
The theory of relativity is built on the idea that time and space are relative

==================================

A brief message congratulating the team on the launch:

        Hi everyone,

        I just 
> wanted to say a big congratulations to the team on the launch of the new website.

        I think it looks fantastic and I'm sure it'll be a huge success.

        Please let me know if you need anything else from me.

        Best,

==================================

Translate English to French:

        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        cheese =>
> fromage
        fish => poisson
        giraffe => girafe
        elephant => éléphant
        cat => chat
        giraffe => girafe
        elephant => éléphant
        cat => chat
        giraffe => gira

==================================

alucard001 commented 1 year ago

Thanks. Would you mind share your Colab notebook and file structure? I think I did some config wrong and would like to know how do you set your configuration. Thank you.

RonanKMcGovern commented 1 year ago

It should help to use a sharded and quantised of the model such as: https://huggingface.co/Trelis/Llama-2-7b-chat-hf-sharded-bf16

There's a notebook there too for inference which includes quantisation.

meta-llama / llama

Running llama-2-7b timeout in Google Colab #496