Closed tjyothirmai closed 2 weeks ago
Could you please provide your training command for debugging?
This is the command I am using,
python train.py --data doclaynet --model m-doclayout --epoch 500 --image-size 1120 --batch-size 64 --project public_dataset/doclaynet
I took from this link https://github.com/opendatalab/DocLayout-YOLO/blob/main/assets/script.sh
Could you please try single GPU training? Can single GPU training work?
I am getting same error , if I use Single GPU also, It would be better if you provide details about how to train for custom dataset.
Thanks .
You can try uncomment assert amp_allclose(YOLO("yolov8n.pt"), im)
in
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/utils/checks.py", line 653, in check_amp
That line is already uncommented in the script .
That line is already uncommented in the script .
Sorry I said it wrong, please comment this line, your issue appears to be same as https://github.com/opendatalab/DocLayout-YOLO/issues/34
@JulioZhao97 once you comment the lines I got this error:
10 epochs completed in 0.014 hours.
/home/ubuntu/DocLayout-YOLO/doclayout_yolo/utils/torch_utils.py:486: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
x = torch.load(f, map_location=torch.device("cpu"))
Optimizer stripped from finetuning/yolov10m-doclayout_data/config_epoch10_imgsz1024_bs4_pretrain_unknown6/weights/last.pt, 40.6MB
Optimizer stripped from finetuning/yolov10m-doclayout_data/config_epoch10_imgsz1024_bs4_pretrain_unknown6/weights/best.pt, 40.6MB
Validating finetuning/yolov10m-doclayout_data/config_epoch10_imgsz1024_bs4_pretrain_unknown6/weights/best.pt...
Ultralytics YOLOv0.0.2 🚀 Python-3.10.15 torch-2.5.1+cu124 CUDA:0 (NVIDIA A100-SXM4-40GB, 40326MiB)
/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/tasks.py:733: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ckpt = torch.load(file, map_location="cpu")
YOLOv10m-doclayout summary (fused): 465 layers, 19920802 parameters, 0 gradients
Traceback (most recent call last):
File "/home/ubuntu/DocLayout-YOLO/train.py", line 63, in <module>
results = model.train(
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/model.py", line 660, in train
self.trainer.train()
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 214, in train
self._do_train(world_size)
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 473, in _do_train
self.final_eval()
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 630, in final_eval
self.metrics = self.validator(model=f)
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/validator.py", line 170, in __call__
model.warmup(imgsz=(1 if pt else self.args.batch, 3, imgsz, imgsz)) # warmup
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/autobackend.py", line 586, in warmup
self.forward(im) # warmup
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/autobackend.py", line 420, in forward
y = self.model(im, augment=augment, visualize=visualize, embed=embed)
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 96, in forward
return self.predict(x, *args, **kwargs)
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 114, in predict
return self._predict_once(x, profile, visualize, embed)
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 136, in _predict_once
x = m(x) # run
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 114, in forward
y.append(m(y[-1]))
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 77, in forward
return x + self.cv2(self.dilated_block(self.cv1(x))) if self.add else self.cv2(self.dilated_block(self.cv1(x)))
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 43, in forward
dx = [self.dilated_conv(x, d) for d in self.dilation]
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 43, in <listcomp>
dx = [self.dilated_conv(x, d) for d in self.dilation]
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 36, in dilated_conv
bn = self.dcv.bn
File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1931, in __getattr__
raise AttributeError(
AttributeError: 'Conv' object has no attribute 'bn'
Which I overcome on doclayout_yolo/nn/modules/g2l_crm.py
with
def dilated_conv(self, x, dilation):
weight = self.dcv.conv.weight
padding = dilation * (self.k//2)
x = F.conv2d(x, weight, stride=1, padding=padding, dilation=dilation)
if hasattr(self.dcv, 'bn'):
x = self.dcv.bn(x)
if hasattr(self.dcv, 'act'):
x = self.dcv.act(x)
return x
I've just made a PR #35 to fix the different things this hope it helps.
@agombert @tjyothirmai Thanks for contribution! This is a bug I know but did not fix, because this bug only affects the final evaluation, as shown in the traceback information. And the evaluation during each epoch can perform well.
File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 630, in final_eval
self.metrics = self.validator(model=f)
Then you can get the results of the best.pt
through val.py
. As for the PR https://github.com/opendatalab/DocLayout-YOLO/pull/35 you contribute, I prefer not to change module code, I suggest resolve this by cancel model.fuse()
operation in final_eval
, which is almost same as in https://github.com/opendatalab/DocLayout-YOLO/issues/23. If you plan to fix this, your kindness cannot be more appreciated! Thanks for your contribution again!
I'll take a look at it tomorrow 👍
Hey, I tried to fix with the fuse
by adding an argument.fuse=False
but still the same error :/. It works with val.py
indeed, maybe just need to make a call to the val.py
instead ?
I have downloaded the dataset of DocLayNet and tried to initiate the train.py I am getting this error : Please help me with this .
ckpt = torch.load(file, map_location="cpu") Traceback (most recent call last): File "train.py", line 65, in
results = model.train(
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/model.py", line 660, in train
self.trainer.train()
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 214, in train
self._do_train(world_size)
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 328, in _do_train
self._setup_train(world_size)
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 272, in _setup_train
self.amp = torch.tensor(check_amp(self.model), device=self.device)
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/utils/checks.py", line 653, in check_amp
assert amp_allclose(YOLO("yolov8n.pt"), im)
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/models/yolo/model.py", line 28, in init
super().init(model=model, task=task, verbose=verbose)
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/model.py", line 144, in init
self._load(model, task=task)
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/model.py", line 233, in _load
self.model, self.ckpt = attempt_load_one_weight(weights)
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 807, in attempt_load_one_weight
ckpt, weight = torch_safe_load(weight) # load ckpt
File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 733, in torch_safe_load
ckpt = torch.load(file, map_location="cpu")
File "/home/ubuntu/Workspace/efs/jyothi/layout1/lib/python3.8/site-packages/torch/serialization.py", line 1114, in load
return _legacy_load(
File "/home/ubuntu/Workspace/efs/jyothi/layout1/lib/python3.8/site-packages/torch/serialization.py", line 1338, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
Regards, Jyothirmai.