westlake-repl / SaProt

[ICLR'24 spotlight] Saprot: Protein Language Model with Structural Alphabet
MIT License
271 stars 25 forks source link

Fine-tuned model weights #4

Closed prihoda closed 8 months ago

prihoda commented 8 months ago

Hi all, thanks for the open-source release! Are you also planning to release the fine-tuned model weights, namely the Thermostability model?

LTEnjoy commented 8 months ago

Hi!

That's a good question. For now we are not going to release the fine-tuned model for diverse downstream tasks because The fine-tuned model has the same size of original model, and release all of them will increase the maintenance cost. We are planning to add the PEFT(Parameter Efficient Fine-Tuning) technique to our model so users can download smaller weight files later.

igortru commented 8 months ago

interesting, freeze_backbone = True really reduce requirement to GPU memory, but use_lora = True return memory requirements back and exit with OutOfMemoryError: CUDA out of memory on my GPU.

LTEnjoy commented 8 months ago

When you start training, you can check out some training information such as how many are the trainable parameters in the screen. Maybe fine-tuning models with lora also requires GPU with a certain memory size.

igortru commented 8 months ago

LoRA model is initialized for training. trainable params: 7723521 || all params: 658966422 || trainable%: 1.17206594177571

it goes through freeze_backbone branch (without LoRA I can train it) │ 32 │ │ if self.freeze_backbone: │ │ ❱ 33 │ │ │ repr = torch.stack(self.get_hidden_states(inputs, reduction="mean")) │ │ 34 │ │ │ x = self.model.classifier.dropout(repr) │ │ 35 │ │ │ x = self.model.classifier.dense(x) │ │ 36 │ │ │ x = torch.tanh(x)
.... │ /opt/conda/lib/python3.7/site-packages/transformers/models/esm/modeling_esm.py:378 in forward │ │ │ │ 375 │ │ if head_mask is not None: │ │ 376 │ │ │ attention_probs = attention_probs * head_mask │ │ 377 │ │ │ │ ❱ 378 │ │ context_layer = torch.matmul(attention_probs, value_layer) │ │ 379 │ │ │ │ 380 │ │ context_layer = context_layer.permute(0, 2, 1, 3).contiguous() │ │ 381 │ │ new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)

OutOfMemoryError: CUDA out of memory. Tried to allocate 82.00 MiB (GPU 0; 15.78 GiB total capacity; 14.38 GiB already allocated; 12.44 MiB free; 14.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Epoch 0: 0%| | 0/3166 [00:06<?, ?it/s]

LTEnjoy commented 8 months ago

Seems that even 7723521 trainable parameters exceed the limitation of your GPU memory. We recommend to set minimal batch size or just freeze all backbone.