Closed GCQi closed 1 year ago
Besides, it also show me the warning /data/123/gcq/LaneDetection/pytorch-auto-drive/utils/datasets/utils.py:30: UserWarning: An output with one or more elements was resized since it had shape [88473600], which does not match the required output shape [128, 3, 360, 640]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /opt/conda/conda-bld/pytorch_1670525552411/work/aten/src/ATen/native/Resize.cpp:17.)
And I changed the batch size
to 128
, maybe it caused the error?
And I changed the
batch size
to128
, maybe it caused the error?
Yes it is probably the reason, scale it down and see if the issue persists? Usually, this loading error accurs when parallel data loading is too heavy for your system.
And I changed the
batch size
to128
, maybe it caused the error?Yes it is probably the reason, scale it down and see if the issue persists? Usually, this loading error accurs when parallel data loading is too heavy for your system.
Now I change it to 64
, and the error has not occured for now
There comes a terrible thing that i still set the batch size
is 64
, and set the workers
as 32
, the error RuntimeError: received 0 items of ancdata
appeared again.
Besides, the train_augmentation as :
train_augmentation = dict(
name='Compose',
transforms=[
dict(
name='Resize',
size_image=(360, 640),
size_label=(360, 640)
),
dict(
name='RandomHorizontalFlip',
flip_prob=0.5
),
dict(
name='RandomRotation',
degrees=10
),
dict(
name='ColorJitter',
brightness=0.4,
contrast=0.4,
saturation=0.4,
hue=0.2
),
dict(
name='ToTensor'
),
dict(
name='RandomLighting',
mean=0.0,
std=0.1,
eigen_value=[0.00341571, 0.01817699, 0.2141788],
eigen_vector=[
[0.41340352, -0.69563484, -0.58752847],
[-0.81221408, 0.00994535, -0.5832747],
[0.41158938, 0.71832671, -0.56089297]
]
),
dict(
name='Normalize',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
normalize_target=True
)
]
)
Have you ever encountered this problem before? I can not get the useful message from the error message.
@GCQi In my experience, this problem comes with heavy data loading (according to your hardware). Large batch size, more workers, and long training schedule increase the probability to encounter this error, which could happen halfway through training. You may find that my default batch size is kept at 20 for this very reason.
Sometimes the file_system strategy could help, but it has a memory leak issue of its own.
OK. thanks for your help !! This open frame work is pretty good, thanks for your contirbution and great work
When I train the lstr based tusimple, as the command is
python main_landet.py --train --config ./configs/lane_detection/lstr/resnet18s_tusimple.py --mixed-precision
, it run sevel epochs and randomly export the errorRuntimeError: received 0 items of ancdata
The error message is: