liuruijin17 / LSTR

This is an official repository of End-to-end Lane Shape Prediction with Transformers.
BSD 3-Clause "New" or "Revised" License
645 stars 129 forks source link

ZeroDivisonError #66

Closed kaustabpal closed 2 years ago

kaustabpal commented 2 years ago

Upon running the command python train.py LSTR --iter 500000 I am getting the following error

loading all datasets TUSIMPLE...
using 4 threads
loading from cache file: ./cache/tusimple_['label_data_0313', 'label_data_0601', 'label_data_0531'].pkl
loading from cache file: ./cache/tusimple_['label_data_0313', 'label_data_0601', 'label_data_0531'].pkl
loading from cache file: ./cache/tusimple_['label_data_0313', 'label_data_0601', 'label_data_0531'].pkl
loading from cache file: ./cache/tusimple_['label_data_0313', 'label_data_0601', 'label_data_0531'].pkl
loading from cache file: ./cache/tusimple_['label_data_0531'].pkl
len of training db: 3626
len of testing db: 358
freeze the pretrained network: False
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
start prefetching data...
shuffling indices...
Total parameters: 765754
MACs: 574.280M
setting learning rate to: 1e-05
training starts from iteration 500001 with learning_rate 1e-05
training start...
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "train.py", line 233, in <module>
    train(training_dbs, validation_db, args.start_iter, args.freeze) # 0
  File "train.py", line 152, in train
    print_freq=10, header=header):
  File "/home/kaustab/Work/lane_detection/TuSimple/LSTR/models/py_utils/misc.py", line 245, in log_every
    header, total_time_str, total_time / len(iterable)))
ZeroDivisionError: float division by zero

Kindly help me out.

liuruijin17 commented 2 years ago

Hi, The number of training iterations is 500000, and --iter 500000 means you would train the model STARTING from the 500000th step. As a result, the len(iteraable) would be 0.

If you want to train from the 500000th, you can change the https://github.com/liuruijin17/LSTR/blob/c8fe59a1d8e1c456c982c9e954ee660ae9116b49/config/LSTR.json#L22 to a number that is larger than 500k, e.g., 600k.

kaustabpal commented 2 years ago

Got it. Thanks a lot.