stanford-crfm / BioMedLM

590 stars 61 forks source link

Max Input and Output length #22

Open shashank140195 opened 11 months ago

shashank140195 commented 11 months ago

In the finetine_for_summarization.py, why max_source_length, train_max_target_length, and eval_max_target_length is set to default 510? Is this the max the BioMedLM can take as Input and only generate max 510 tokens? As soon as I increase the value above this default value, I get the error.

max_source_length: Optional[int] = field( default=510, metadata={"help": "the max source length of summarization data. "} ) train_max_target_length: Optional[int] = field( default=510, metadata={"help": "the max target length for training data. "} ) eval_max_target_length: Optional[int] = field( default=510, metadata={"help": "the max target length for dev data. "}

Error: Traceback (most recent call last): File "finetune_for_summarization.py", line 168, in <module> finetune() File "finetune_for_summarization.py", line 162, in finetune trainer.train() File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1534, in train return inner_training_loop( File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1807, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2649, in training_step loss = self.compute_loss(model, inputs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2674, in compute_loss outputs = model(**inputs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1769, in forward loss = self.module(*inputs, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1075, in forward transformer_outputs = self.transformer( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 843, in forward position_embeds = self.wpe(position_ids) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: device-side assert triggered Compile withTORCH_USE_CUDA_DSAto enable device-side assertions.

J38 commented 11 months ago

The model was trained with a fixed context length of 1024, so the source, target and extra tokens have to fit within that size.

shashank140195 commented 11 months ago

Makes Sense. thank you.