loicland / superpoint_graph

Large-scale Point Cloud Semantic Segmentation with Superpoint Graphs
MIT License
755 stars 214 forks source link

Using SPG with custom dataset matching semantics3d. #200

Closed Tojens closed 4 years ago

Tojens commented 4 years ago

Hi

I'm attempting to use superpoint graph with my own dataset. I've formatted the data so it matches the structure of semantics3d and therefore hope to able to run the code as outlined in the semantics3d readme. The only difference I can really think of is a difference in the number of classes in my custom dataset, which is 7 as opposed to the original 8 in the sema3d and I have adjusted accordingly in sema3d_dataset.py

Both the partition.py and sema3d_dataset.py scripts both run fine with my custom data.

I have run into to issue below when running main.py, which I have a hard time making sense of:

CUDA_VISIBLE_DEVICES=0 python learning/main.py --dataset sema3d --SEMA3D_PATH /home/tohj/SuperpointGraph/superpoint_graph/Sema3d --db_test_name testred --db_train_name trainval --epochs 200 --lr_steps '[350, 400, 450]' --test_nth_epoch 50 --model_config 'gru_10_0,f_7' --ptn_nfeat_stn 11 --nworkers 0 --pc_attrib xyzrgbelpsv --odir "results/sema3d/trainval_best"


Will save to results/sema3d/trainval_best
Total number of parameters: 276907
Module(
  (ecc): GraphNetwork(
    (0): RNNGraphConvModule(
      (_cell): GRUCellEx(
        32, 32
        (ini): InstanceNorm1d(1, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (inh): InstanceNorm1d(1, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (ig): Linear(in_features=32, out_features=32, bias=True)
      )(ingate layernorm)
      (_fnet): Sequential(
        (0): Linear(in_features=13, out_features=32, bias=True)
        (1): ReLU(inplace=True)
        (2): Linear(in_features=32, out_features=128, bias=True)
        (3): ReLU(inplace=True)
        (4): Linear(in_features=128, out_features=64, bias=True)
        (5): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (6): ReLU(inplace=True)
        (7): Linear(in_features=64, out_features=1024, bias=False)
      )
    )
    (1): Linear(in_features=352, out_features=7, bias=True)
  )
  (ptn): PointNet(
    (stn): STNkD(
      (convs): Sequential(
        (0): Conv1d(11, 64, kernel_size=(1,), stride=(1,))
        (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv1d(64, 64, kernel_size=(1,), stride=(1,))
        (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
        (7): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (8): ReLU(inplace=True)
      )
      (fcs): Sequential(
        (0): Linear(in_features=128, out_features=128, bias=True)
        (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Linear(in_features=128, out_features=64, bias=True)
        (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
      )
      (proj): Linear(in_features=64, out_features=4, bias=True)
    )
    (convs): Sequential(
      (0): Conv1d(11, 64, kernel_size=(1,), stride=(1,))
      (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv1d(64, 64, kernel_size=(1,), stride=(1,))
      (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
      (7): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
      (9): Conv1d(128, 128, kernel_size=(1,), stride=(1,))
      (10): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (11): ReLU(inplace=True)
      (12): Conv1d(128, 256, kernel_size=(1,), stride=(1,))
      (13): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (14): ReLU(inplace=True)
    )
    (fcs): Sequential(
      (0): Linear(in_features=257, out_features=256, bias=True)
      (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Linear(in_features=256, out_features=64, bias=True)
      (4): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Linear(in_features=64, out_features=32, bias=True)
    )
  )
)
Train dataset: 116 elements - Test dataset: 33 elements - Validation dataset: 6 elements
Epoch 0/200 (results/sema3d/trainval_best):
/home/tohj/anaconda3/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
  0%|                                    | 0/116 [00:00<?, ?it/s]

Traceback (most recent call last):
  File "learning/main.py", line 459, in <module>
    main()
  File "learning/main.py", line 329, in main
    acc, loss, oacc, avg_iou = train()
  File "learning/main.py", line 202, in train
    embeddings = ptnCloudEmbedder.run(model, *clouds_data)
  File "/home/tohj/SuperpointGraph/superpoint_graph/learning/../learning/pointnet.py", line 167, in run_full_monger
    out = model.ptn(Variable(clouds), (clouds_global))
  File "/home/tohj/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tohj/SuperpointGraph/superpoint_graph/learning/../learning/pointnet.py", line 122, in forward
    T = self.stn(input[:,:self.nfeat_stn,:])
IndexError: too many indices for tensor of dimension 1
  0%|                                    | 0/116 [00:00<?, ?it

If anyone have any ideas I would be very interested. Alternatively, I will follow the guidelines for custom datasets, but I would like to avoid this since my data and label files should work the same as the semantics3d dataset. If you need anymore information please let me know.

Best regards, Tobias

loicland commented 4 years ago

Hi,

in home/tohj/SuperpointGraph/superpoint_graph/learning/../learning/pointnet.py can you print the shape of input just before the faulty line:

print(input.shape)
T = self.stn(input[:,:self.nfeat_stn,:])
Tojens commented 4 years ago

Thanks for the quick response :)

print(input.shape) simply returns: torch.Size([0])

loicland commented 4 years ago

ah that's no good! Seems like something went wrong with the loader. Could you add the following flag in /learning/spg.pyline 161:

print(len(clouds))
if len(clouds) != 0:
    clouds = np.stack(clouds)
    print(clouds.shape)
Tojens commented 4 years ago

Yes, it's starting to make sense that it doesn't run.

print(len(clouds))

returns 0. Any ideas why that might be?

loicland commented 4 years ago

Ok so the parsed files are not read correctly.

Add the following flags in /learning/spg.py line 206:

P = P[:].astype(np.float32)
print(P.shape)
Tojens commented 4 years ago

print(P.shape) doesn't return anything to console. I did this instead:

def load_superpoint(args, fname, id, train, test_seed_offset):
    hf = h5py.File(fname,'r')
    P = hf['{:d}'.format(id)]
--> print(P)
    N = P.shape[0]
    if N < args.ptn_minpts: # skip if too few pts (this must be consistent at train and test time)
        return None, N

    P = P[:].astype(np.float32)
    print("P shape here:", P.shape) ##Doesn't return anything

Printing P in this manner returns the following:

Train dataset: 116 elements - Test dataset: 33 elements - Validation dataset: 6 elements
Epoch 0/200 (results/sema3d/trainval_best):
/home/tohj/anaconda3/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
  0%|                                    | 0/116 [00:00<?, ?it/s]

<HDF5 dataset "0": shape (15, 11), type "<f8">
<HDF5 dataset "1": shape (1, 11), type "<f8">

... ##Prints all the HDF5 datasets

<HDF5 dataset "191": shape (2, 11), type "<f8">
<HDF5 dataset "192": shape (3, 11), type "<f8">

Traceback (most recent call last):
  File "learning/main.py", line 459, in <module>
    main()
  File "learning/main.py", line 329, in main
    acc, loss, oacc, avg_iou = train()
  File "learning/main.py", line 202, in train
    embeddings = ptnCloudEmbedder.run(model, *clouds_data)
  File "/home/tohj/SuperpointGraph/superpoint_graph/learning/../learning/pointnet.py", line 168, in run_full_monger
    out = model.ptn(Variable(clouds), (clouds_global))
  File "/home/tohj/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tohj/SuperpointGraph/superpoint_graph/learning/../learning/pointnet.py", line 123, in forward
    T = self.stn(input[:,:self.nfeat_stn,:])
IndexError: too many indices for tensor of dimension 1
  0%|                                    | 0/116 [00:00<?, ?it/s]
loicland commented 4 years ago

Ok. It seems like your superpoints are too small.

Several things can be done:

Tojens commented 4 years ago

Reducing ptn_minpts did the trick, but I think I'll take some time to consider what the best option is.

Thank you very much for your help. I'll close my issue for now, since it has been solved :)