DocLayNet Train from scrath error _pickle.UnpicklingError: invalid load key, '<'.

tjyothirmai commented 2 weeks ago

I have downloaded the dataset of DocLayNet and tried to initiate the train.py I am getting this error : Please help me with this .

ckpt = torch.load(file, map_location="cpu") Traceback (most recent call last): File "train.py", line 65, in results = model.train( File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/model.py", line 660, in train self.trainer.train() File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 214, in train self._do_train(world_size) File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 328, in _do_train self._setup_train(world_size) File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 272, in _setup_train self.amp = torch.tensor(check_amp(self.model), device=self.device) File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/utils/checks.py", line 653, in check_amp assert amp_allclose(YOLO("yolov8n.pt"), im) File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/models/yolo/model.py", line 28, in init super().init(model=model, task=task, verbose=verbose) File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/model.py", line 144, in init self._load(model, task=task) File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/engine/model.py", line 233, in _load self.model, self.ckpt = attempt_load_one_weight(weights) File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 807, in attempt_load_one_weight ckpt, weight = torch_safe_load(weight) # load ckpt File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 733, in torch_safe_load ckpt = torch.load(file, map_location="cpu") File "/home/ubuntu/Workspace/efs/jyothi/layout1/lib/python3.8/site-packages/torch/serialization.py", line 1114, in load return _legacy_load( File "/home/ubuntu/Workspace/efs/jyothi/layout1/lib/python3.8/site-packages/torch/serialization.py", line 1338, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'.

Regards, Jyothirmai.

JulioZhao97 commented 2 weeks ago

Could you please provide your training command for debugging?

tjyothirmai commented 2 weeks ago

This is the command I am using,

python train.py --data doclaynet --model m-doclayout --epoch 500 --image-size 1120 --batch-size 64 --project public_dataset/doclaynet

I took from this link https://github.com/opendatalab/DocLayout-YOLO/blob/main/assets/script.sh

JulioZhao97 commented 2 weeks ago

Could you please try single GPU training? Can single GPU training work?

tjyothirmai commented 2 weeks ago

I am getting same error , if I use Single GPU also, It would be better if you provide details about how to train for custom dataset.

Thanks .

JulioZhao97 commented 2 weeks ago

You can try uncomment assert amp_allclose(YOLO("yolov8n.pt"), im) in File "/home/ubuntu/Workspace/efs/jyothi/DocLayout-YOLO/doclayout_yolo/utils/checks.py", line 653, in check_amp

tjyothirmai commented 2 weeks ago

That line is already uncommented in the script .

JulioZhao97 commented 2 weeks ago

That line is already uncommented in the script .

Sorry I said it wrong, please comment this line, your issue appears to be same as https://github.com/opendatalab/DocLayout-YOLO/issues/34

agombert commented 2 weeks ago

@JulioZhao97 once you comment the lines I got this error:

10 epochs completed in 0.014 hours.
/home/ubuntu/DocLayout-YOLO/doclayout_yolo/utils/torch_utils.py:486: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  x = torch.load(f, map_location=torch.device("cpu"))
Optimizer stripped from finetuning/yolov10m-doclayout_data/config_epoch10_imgsz1024_bs4_pretrain_unknown6/weights/last.pt, 40.6MB
Optimizer stripped from finetuning/yolov10m-doclayout_data/config_epoch10_imgsz1024_bs4_pretrain_unknown6/weights/best.pt, 40.6MB

Validating finetuning/yolov10m-doclayout_data/config_epoch10_imgsz1024_bs4_pretrain_unknown6/weights/best.pt...
Ultralytics YOLOv0.0.2 🚀 Python-3.10.15 torch-2.5.1+cu124 CUDA:0 (NVIDIA A100-SXM4-40GB, 40326MiB)
/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/tasks.py:733: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(file, map_location="cpu")
YOLOv10m-doclayout summary (fused): 465 layers, 19920802 parameters, 0 gradients
Traceback (most recent call last):
  File "/home/ubuntu/DocLayout-YOLO/train.py", line 63, in <module>
    results = model.train(
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/model.py", line 660, in train
    self.trainer.train()
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 214, in train
    self._do_train(world_size)
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 473, in _do_train
    self.final_eval()
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 630, in final_eval
    self.metrics = self.validator(model=f)
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/validator.py", line 170, in __call__
    model.warmup(imgsz=(1 if pt else self.args.batch, 3, imgsz, imgsz))  # warmup
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/autobackend.py", line 586, in warmup
    self.forward(im)  # warmup
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/autobackend.py", line 420, in forward
    y = self.model(im, augment=augment, visualize=visualize, embed=embed)
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 96, in forward
    return self.predict(x, *args, **kwargs)
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 114, in predict
    return self._predict_once(x, profile, visualize, embed)
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/tasks.py", line 136, in _predict_once
    x = m(x)  # run
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 114, in forward
    y.append(m(y[-1]))
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 77, in forward
    return x + self.cv2(self.dilated_block(self.cv1(x))) if self.add else self.cv2(self.dilated_block(self.cv1(x)))
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 43, in forward
    dx = [self.dilated_conv(x, d) for d in self.dilation]
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 43, in <listcomp>
    dx = [self.dilated_conv(x, d) for d in self.dilation]
  File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/nn/modules/g2l_crm.py", line 36, in dilated_conv
    bn = self.dcv.bn
  File "/home/ubuntu/miniconda3/envs/doclayout_yolo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1931, in __getattr__
    raise AttributeError(
AttributeError: 'Conv' object has no attribute 'bn'

Which I overcome on doclayout_yolo/nn/modules/g2l_crm.py with

def dilated_conv(self, x, dilation):
        weight = self.dcv.conv.weight
        padding = dilation * (self.k//2)
        x = F.conv2d(x, weight, stride=1, padding=padding, dilation=dilation)
        if hasattr(self.dcv, 'bn'):
            x = self.dcv.bn(x)
        if hasattr(self.dcv, 'act'):
            x = self.dcv.act(x)
        return x

I've just made a PR #35 to fix the different things this hope it helps.

JulioZhao97 commented 2 weeks ago

@agombert @tjyothirmai Thanks for contribution! This is a bug I know but did not fix, because this bug only affects the final evaluation, as shown in the traceback information. And the evaluation during each epoch can perform well.

File "/home/ubuntu/DocLayout-YOLO/doclayout_yolo/engine/trainer.py", line 630, in final_eval
    self.metrics = self.validator(model=f)

Then you can get the results of the best.pt through val.py. As for the PR https://github.com/opendatalab/DocLayout-YOLO/pull/35 you contribute, I prefer not to change module code, I suggest resolve this by cancel model.fuse() operation in final_eval, which is almost same as in https://github.com/opendatalab/DocLayout-YOLO/issues/23. If you plan to fix this, your kindness cannot be more appreciated! Thanks for your contribution again!

agombert commented 2 weeks ago

I'll take a look at it tomorrow 👍

agombert commented 1 week ago

Hey, I tried to fix with the fuse by adding an argument.fuse=False but still the same error :/. It works with val.py indeed, maybe just need to make a call to the val.py instead ?

opendatalab / DocLayout-YOLO

DocLayNet Train from scrath error _pickle.UnpicklingError: invalid load key, '<'. #32