Training Failed, Help Wanted!

Thanks for the awesome work!

I didn't generate data but used the data loader to download the original data. However, the traninig was failed.

I print the data['metaData'] in load() function at deeptracking/data/dataset.py, showing skull's raw_training data's info:

{'save_type': 'png', 'PairsQty': 0, 'frameQty': '173'}

Then I get error raised in deeptracking/data/dataset.py:

def load_minibatch(self, task):

and then obviously get compute_mean as all 0s. I think the error occurs because of get_sample() part that attempts to get pairs but get nothing. But I am not sure why this happens.

In the end, though I think there should be no connection, I get another error for tracker class:

[INFO] Setup Model
config: ,  <class 'PyTorchLua.RGBDTracker'> 0
LuaWrapper.__init__ RGBDTracker fromLua False args ('cuda', 'adam', 0)
Traceback (most recent call last):
  File "train.py", line 227, in <module>
    tracker_model = config_model(data, train_dataset)
  File "train.py", line 132, in config_model
    tracker_model = model_class('cuda', 'adam', gpu_device)
  File "/home/name/.local/lib/python3.5/site-packages/PyTorch-4.1.1_SNAPSHOT-py3.5-linux-x86_64.egg/PyTorchHelpers.py", line 20, in __init__
    PyTorchAug.LuaClass.__init__(self, splitName, *args)
  File "/home/name/.local/lib/python3.5/site-packages/PyTorch-4.1.1_SNAPSHOT-py3.5-linux-x86_64.egg/PyTorchAug.py", line 255, in __init__
    raise Exception(errorMessage)
Exception: attempt to call a nil value

The train.json is as follows: (class.lua was copied to the current path)

{
  "data_augmentation":{
      "rgb_noise": "4",
      "depth_noise": "20",
      "occluder_path": "aug_util/occluder",
      "background_path": "aug_util/background",
      "blur_noise": "7",
      "h_noise": "0.07",
      "s_noise": "0.0",
      "v_noise": "0.2",
      "channel_hide": "True"
    },

  "training_param":{
      "file": "class.lua",
      "learning_rate": "0.005",
      "learning_rate_decay": "1e-5",
      "weight_decay": "0",
      "input_size": "150",
      "linear_size": "50",
      "convo1_size": "24",
      "convo2_size": "48"
    },

  "logging":{
      "path": "log/output",
      "level": "DEBUG"
  },

  "session_name": "test001",
  "train_path": "/data/name/deeptrack/raw_training/skull/train",
  "valid_path": "/data/name/deeptrack/raw_training/skull/valid",
  "output_path": "checkpoint",
  "model_finetune": "",
  "minibatch_size": "128",
  "max_epoch": "30",
  "early_stop_wait_limit" : "5",
  "gpu_device" : "0",
  "image_size": "150"
}

Are there any suggestions? I followed the README.md to installed Torch and python wrapper for it. Thanks a lot! @MathGaron

lvsn / deeptracking

Training Failed, Help Wanted! #10