mitmul / ssai-cnn

Semantic Segmentation for Aerial / Satellite Images with Convolutional Neural Networks including an unofficial implementation of Volodymyr Mnih's methods
http://www.ingentaconnect.com/content/ist/jist/2016/00000060/00000001/art00003
MIT License
260 stars 75 forks source link

IndexError: index 76 is out of bounds for axis 1 with size 3 #21

Open InfectedPacket opened 6 years ago

InfectedPacket commented 6 years ago

Hello,

I am currently trying to automate parts of this project and I am running into difficulties during the training phase using CPU mode, which throws an IndexError and appears to hang the entire training. I am using a very small dataset from the mass_buildings set, i.e. I am using 8 training images and 2 validation images. The purpose is only to test and not to have accurate results at the moment. Below is the state of the installation and steps I am using:

System:

uname -a
Linux user-VirtualBox 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Python (w/o Anaconda):

$ python -V
Python 3.5.2

Python modules:

user@user-VirtualBox:~/Source/ssai-cnn$ pip3 freeze
...
chainer==1.5.0.2
...
Cython==0.23.4
...
h5py==2.7.1
...
lmdb==0.87
...
matplotlib==2.1.1
...
numpy==1.10.1
onboard==1.2.0
opencv-python==3.1.0.3
...
six==1.10.0
tqdm==4.19.5
...

Additionally, Boost 1.59.0 and OpenCV 3.0.0 have been build and installed from source and both installs appears successful. The utils is also built successfully.

I have downloaded only a small subset of the mass_buildings dataset:

# ls -R ./data/mass_buildings/train/
./data/mass_buildings/train/:
map  sat

./data/mass_buildings/train/map:
22678915_15.tif  22678930_15.tif  22678945_15.tif  22678960_15.tif

./data/mass_buildings/train/sat:
22678915_15.tiff  22678930_15.tiff  22678945_15.tiff  22678960_15.tiff

Below is the output obtained by running the shells/create_datasets.sh script, modified only to build the mass_buildings data:

patch size: 92 24 16
n_all_files: 1
divide:0.6727173328399658
0 / 1 n_patches: 7744
patches:     7744
patch size: 92 24 16
n_all_files: 1
divide:0.6314394474029541
0 / 1 n_patches: 7744
patches:     7744
patch size: 92 24 16
n_all_files: 4
divide:0.6260504722595215
0 / 4 n_patches: 7744
divide:0.667414665222168
1 / 4 n_patches: 15488
divide:0.628319263458252
2 / 4 n_patches: 23232
divide:0.6634025573730469
3 / 4 n_patches: 30976
patches:     30976
0.03437542915344238 sec (128, 3, 64, 64) (128, 16, 16)

Then the training script is initiated using the following command:

user@user-VirtualBox:~/Source/ssai-cnn$ CHAINER_TYPE_CHECK=0 CHAINER_SEED=$1 \
> nohup python ./scripts/train.py \
> --seed 0 \
> --gpu -1 \
> --model ./models/MnihCNN_multi.py \
> --train_orthokill _db data/mass_buildings/lmdb/train_sat \
> --train_label_db data/mass_buildings/lmdb/train_map \
> --valid_ortho_db data/mass_buildings/lmdb/valid_sat \
> --valid_label_db data/mass_buildings/lmdb/valid_map \
> --dataset_size 1.0 \
> --epoch 1

As you can see above, I've been using only 8 images and a single epoch. I left the entire process run an entire night and never completed. Hence the reason I believe the process simply hanged. Using nohup also does not complete. When forcefully stopped using Ctrl-C, I'm obtaining the following message:

# cat nohup.out 
Traceback (most recent call last):
  File "./scripts/train.py", line 313, in <module>
    model, optimizer = one_epoch(args, model, optimizer, epoch, True)
  File "./scripts/train.py", line 265, in one_epoch
    optimizer.update(model, x, t)
  File "/usr/local/lib/python3.5/dist-packages/chainer/optimizer.py", line 377, in update
    loss = lossfun(*args, **kwds)
  File "./models/MnihCNN_multi.py", line 31, in __call__
    self.loss = F.softmax_cross_entropy(h, t, normalize=False)
  File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 152, in softmax_cross_entropy
    return SoftmaxCrossEntropy(use_cudnn, normalize)(x, t)
  File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 105, in __call__
    outputs = self.forward(in_data)
  File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 183, in forward
    return self.forward_cpu(inputs)
  File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 39, in forward_cpu
    p = yd[six.moves.range(t.size), numpy.maximum(t.flat, 0)]
IndexError: index 76 is out of bounds for axis 1 with size 3

This is the only components that fails at this moment. I've tested the prediction and evaluation phases using the pre-trained data and both seems to complete successfully. Any assistance on how I could use the training script using custom datasets would be appreciated.

Thank you

mitmul commented 6 years ago

@InfectedPacket Thank you for trying my code. If you don't change anything in the code, the training successfully run?