stanford-crfm / BioMedLM

590 stars 61 forks source link

can it be fine tuned in samller GPU #8

Open anyili opened 1 year ago

anyili commented 1 year ago

Hi, could the model be fine-tuned in just a few smaller GPUs, like 4 A40 with 48Gb memory. I am trying to use deepspeed, but still OOM. thanks

J38 commented 1 year ago

I will try to fine tune with some smaller scale resources and let you know what I see.

I think running with Flash Attention will help a lot with GPU memory issues ...

anyili commented 1 year ago

Thanks. I did run successfully fine tune using deepspeed with much less resource.

On Thu, Jan 26, 2023 at 6:38 AM J38 @.***> wrote:

I will try to fine tune with some smaller scale resources and let you know what I see.

I think running with Flash Attention will help a lot with GPU memory issues ...

— Reply to this email directly, view it on GitHub https://github.com/stanford-crfm/pubmedgpt/issues/8#issuecomment-1404883922, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK7ODLDQMJ4A4QM6S6CMETWUJO2ZANCNFSM6AAAAAAUGGTOMI . You are receiving this because you authored the thread.Message ID: @.***>

anyili commented 1 year ago

BTW, I turn on the Flash. Attention with --use_flash True, I got Runtime Exception RuntimeError: Expected is_sm80 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

Ideas ?

anyili commented 1 year ago

It seems flash_attn only supports heads dimension of multiple 8, pubmedgpt is 20.

J38 commented 1 year ago

We trained the model with Flash Attention so it should definitely work ... I will get a working example going and get back to you with what I did ...

J38 commented 1 year ago

I've fine tuned it with Flash Attention before ...

anyili commented 1 year ago

Btw, do you mind sharing your fine tuned model. Thanks

On Thu, Jan 26, 2023 at 10:47 PM J38 @.***> wrote:

I've fine tuned it with Flash Attention before ...

— Reply to this email directly, view it on GitHub https://github.com/stanford-crfm/pubmedgpt/issues/8#issuecomment-1405980926, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK7ODLPKDFJUC3EV3HVJZDWUNAM5ANCNFSM6AAAAAAUGGTOMI . You are receiving this because you authored the thread.Message ID: @.***>

J38 commented 1 year ago

As compute resources become available we should fine-tune some models and release new versions that are fine tuned! Just to provide and update, the NIH has asked us to rename the model since they hold the trademark on "PubMed" and OpenAI is trademarking GPT ... so from now on the model is BioMedLM !

guathwa commented 1 year ago

Hi anyili, I am trying to fine tune on a seqcls task with deepspeed but I encountered an error "RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.". I have just raised this issue for help.

I saw that you manage to fine tune with deepspeed successfully. Are you able to share with me how you do it ? Thanks!

zhengbiqing commented 10 months ago

@anyili can you tell me how to finetune seqcls with deepspeed?

and anyone finetune seqcls success with --use_flash?