Closed stanny880913 closed 1 year ago
I did not see this in my experiment. You can try using a single GPU or running multiple times to check if this always happens. I also provide pretrained weights of DWN.
I did not see this in my experiment. You can try using a single GPU or running multiple times to check if this always happens. I also provide pretrained weights of DWN.
I fixed it!!by using bigger batch_size, maybe the steps is too small that cause gradiant descent, but when im going to use val and test data to do_eval,it both show similar error like
AssertionError: The length of results is not equal to the dataset len: 18024 != 36048
but why the dataset number will be wrong???
the code stuck at prog_bar = mmcv.ProgressBar(len(dataset))
Make sure that arguments test_samples_per_gpu=1 and num_gpus=1 for running evaluation.
Make sure that arguments test_samples_per_gpu=1 and num_gpus=1 for running evaluation.
parser.add_argument('--num_gpus', type=int, default=1)
parser.add_argument('--samples_per_gpu', type=int, default=1)
parser.add_argument('--test_samples_per_gpu', type=int, default=1)
parser.add_argument('--workers_per_gpu', type=int, default=2)
I set these args like this, but it's still raise error
AssertionError: The length of results is not equal to the dataset len: 18024 != 36048
and sorry , the error code is stuck at fusion_dataset.py format_results_all_cams
, how to solve it!!
This shows that only detections for 18024 images are obtained but there are 36048 images in total. You may check the number of images in dataloader by len(dataloader.dataset) in the function single_gpu_test.
This shows that only detections for 18024 images are obtained but there are 36048 images in total. You may check the number of images in dataloader by len(dataloader.dataset) in the function single_gpu_test.
I checked it!!! it's show that len(dataloader.dataset) = 36048, it's the same, but why it's only run 18024, only a half, val and test are the same result!
In the single_gpu_test funtion, you can further exam whether the length of the list results increase by one after each loop of running the model.
In the single_gpu_test funtion, you can further exam whether the length of the list results increase by one after each loop of running the model.
I print len(result)
to test whether the length of the list results increase, but it's always show 1 how to fixed it!!
def single_gpu_test(model, model_mlp, data_loader):
"""Test model with single gpu.
Args:
model (nn.Module): Model to be tested.
data_loader (nn.Dataloader): Pytorch data loader.
Returns:
list[dict]: The prediction results.
"""
print("Into single gpu test!!!!")
model.eval()
model_mlp.eval()
results = []
dataset = data_loader.dataset
#dataset_len = 36114
#BUG!!
prog_bar = mmcv.ProgressBar(len(dataset))
# prog_bar = mmcv.ProgressBar(18057)
print("progbar = ",prog_bar)
for i, data in enumerate(data_loader):
with torch.no_grad():
result = model(model_mlp=model_mlp,
return_loss=False, rescale=True, **data)
results.extend(result)
print("test result len :",len(result))
batch_size = len(result)
# batch_size = 8
for _ in range(batch_size):
prog_bar.update()
print("results_len = ",len(results))
return results
I set print at here!!
Length of results will increase. The result is single image detection and its length is 1.
Length of results will increase. The result is single image detection and its length is 1.
Sorry !!I print the wrong thing! results are always increasing, what can I do for the next steps!! I really don't know why its stopped it ~
assert i==len(results)
assert i==len(results)
Sorry , may I ask where to put it to fix the problem? Thx
def single_g``` pu_test(model, model_mlp, data_loader): """Test model with single gpu.
Args:
model (nn.Module): Model to be tested.
data_loader (nn.Dataloader): Pytorch data loader.
Returns:
list[dict]: The prediction results.
"""
print("Into single gpu test!!!!")
model.eval()
model_mlp.eval()
results = []
print("init = ", len(results))
dataset = data_loader.dataset
# dataset_len = 36114
# BUG!!
prog_bar = mmcv.ProgressBar(len(dataset))
# prog_bar = mmcv.ProgressBar(18057)
for i, data in enumerate(data_loader):
with torch.no_grad():
result = model(model_mlp=model_mlp,
return_loss=False, rescale=True, **data)
assert i == len(results), (
'The length of results is not equal to i: {} != {}'.
format(len(results), i))
results.extend(result)
print("/n")
print("=======")
print("test result len :", len(results))
print("=======")
batch_size = len(result)
# batch_size = 8
for _ in range(batch_size):
prog_bar.update()
print("results_len = ", len(results))
return results
I put it here, is it right? Thx
The line before results.extend(result)
The line before results.extend(result)
I added it,it's dosen't raise this assert message, when i = 18057,it will jump into evaluate function then raiseAssertionError: The length of results is not equal to the dataset len: 18057 != 36114
when i --do_eval on val dataset , by the way , len(data_loader.dataset) is 36114
You may check why the loop terminated when i=18057, which is unexpected if (data_loader.dataset) is 36114.
You may check why the loop terminated when i=18057, which is unexpected if (data_loader.dataset) is 36114.
Thx for your help!!!
I'm runnung
CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/train_dwn.py --workers_per_gpu 2 --samples_per_gpu 256 --num_gpus 4 --epochs 200 --dir_data data/nuscenes/fusion_data/dwn_radiant_pgd
,when start training,the first loss show 1.xxx, but the second one start became nan,why this happening? Thank you