Open jordanparker6 opened 2 years ago
Hi, for the memory issue, please refer to https://github.com/hustvl/YOLOS/issues/5#issuecomment-867533669
Ahh that great! Thank you.
For those interested, I found that the HF implementation is set up for Gradient Accumulation.
Enable it with:
self.model = YolosForObjectDetection.from_pretrained(
self.hparams.pretrained_model_name_or_path,
config=config,
ignore_mismatched_sizes=True
)
self.model.gradient_checkpointing_enable()
I was able to increase the batch size from 1 to 8 using this on a T4 with dpp_sharded
in pytorch-lightning. It shaved about 35 mins off per epoch reducing the per epoch time from 165mins to 130mins.
model:
pretrained_model_name_or_path: "hustvl/yolos-base"
learning_rate: 2e-5
data:
data_dir: "/datastores/doclaynet/images"
train_batch_size: 8
val_batch_size: 8
num_workers: 4
trainer:
resume_from_checkpoint: null
accelerator: "gpu"
num_nodes: 1
strategy: "ddp_sharded"
max_epochs: 10
min_epochs: 3
max_steps: -1
val_check_interval: 1.0
check_val_every_n_epoch: 1
gradient_clip_val: 1.0
For those interested, I found that the HF implementation is set up for Gradient Accumulation.
Enable it with:
self.model = YolosForObjectDetection.from_pretrained( self.hparams.pretrained_model_name_or_path, config=config, ignore_mismatched_sizes=True ) self.model.gradient_checkpointing_enable()
I was able to increase the batch size from 1 to 8 using this on a T4 with
dpp_sharded
in pytorch-lightning. It shaved about 35 mins off per epoch reducing the per epoch time from 165mins to 130mins.model: pretrained_model_name_or_path: "hustvl/yolos-base" learning_rate: 2e-5 data: data_dir: "/datastores/doclaynet/images" train_batch_size: 8 val_batch_size: 8 num_workers: 4 trainer: resume_from_checkpoint: null accelerator: "gpu" num_nodes: 1 strategy: "ddp_sharded" max_epochs: 10 min_epochs: 3 max_steps: -1 val_check_interval: 1.0 check_val_every_n_epoch: 1 gradient_clip_val: 1.0
Awesome!:smiling_face_with_three_hearts::smiling_face_with_three_hearts::smiling_face_with_three_hearts:
Using the default FeatureExtractor settings for the HuggingFace port of YOLOS, I am consistently running into CUDA OOM errors on a 16GB V100 (even with a training batch size of 1).
I would like to train YOLOS on publaynet and ideally use 4-8 V100s.
Is there a way to lower the CUDA memory usage while training YOLOS besides batch size (whilst preserving the accuracy and leveraging the pertained models)?
I see that other models (e.g. DiT) use image sizes of 244x244. However, is it fair to assume that such a small image size would not be appropriate for object detection as too much information is lost? In the DiT case document image classification was the objective.