Closed jackie174 closed 1 year ago
Hi, Have you modified the code? this error comes from the unexpected k value of knn operation.
Hello, thanks so much for your reply.
I do not modify the code.
One thing I found is wired.
When I use bash ./scripts/train.sh 0 \ --config ./cfgs/KITTI_models/PoinTr.yaml \ --exp_name example
I first get an error is ./data/PCN/train is not found, Then I download them,
After that, I get the error
assert idx. shape[1] == k
AssertionError
Then I print idx.shape[1] result is 3
During the above process, I even do not download the KITTI dataset. Why did the code require to use PCN?
alright, i got the problem.
This is probably due to the version of knn_cuda . It does return idx (shape B 3 k, but expected to be B k 3).
so please transpose the idx
before this line and the problem will go away
Hi, I try transposing data, but I get more errors. First, I get view error:
File "/content/pointr/models/dgcnn_group.py", line 73, in get_graph_feature
idx = idx.view(-1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Then I change it to reshape, and get the error below:
I am so confused, I did not change any code instead of transposing and reshaping.
_, idx = knn(coor_k, coor_q) # bs k np
print("------------Before transpose\n", type(idx), idx)
idx= torch.transpose(idx, 0, 1)
print("-------------After transpose\n", type(idx), idx)
assert idx.shape[1] == k
idx_base = torch.arange(0, batch_size, device=x_q.device).view(-1, 1, 1) * num_points_k
idx = idx + idx_base
idx = idx.reshape(-1)
The error I got:
+ GPUS=0
+ PY_ARGS='--ckpts ./pretrained/PoinTr_PCN.pth --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example'
+ CUDA_VISIBLE_DEVICES=0
+ python main.py --test --ckpts ./pretrained/PoinTr_PCN.pth --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example
Create experiment path successfully at ./experiments/PoinTr/PCN_models/test_example
Create TFBoard path successfully at ./experiments/PoinTr/PCN_models/TFBoard/test_example
2022-11-13 00:47:49,404 - PoinTr - INFO - Copy the Config file from ./cfgs/PCN_models/PoinTr.yaml to ./experiments/PoinTr/PCN_models/test_example/config.yaml
2022-11-13 00:47:49,404 - PoinTr - INFO - args.config : ./cfgs/PCN_models/PoinTr.yaml
2022-11-13 00:47:49,404 - PoinTr - INFO - args.launcher : none
2022-11-13 00:47:49,404 - PoinTr - INFO - args.local_rank : 0
2022-11-13 00:47:49,404 - PoinTr - INFO - args.num_workers : 4
2022-11-13 00:47:49,404 - PoinTr - INFO - args.seed : 0
2022-11-13 00:47:49,404 - PoinTr - INFO - args.deterministic : False
2022-11-13 00:47:49,404 - PoinTr - INFO - args.sync_bn : False
2022-11-13 00:47:49,404 - PoinTr - INFO - args.exp_name : test_example
2022-11-13 00:47:49,404 - PoinTr - INFO - args.start_ckpts : None
2022-11-13 00:47:49,405 - PoinTr - INFO - args.ckpts : ./pretrained/PoinTr_PCN.pth
2022-11-13 00:47:49,405 - PoinTr - INFO - args.val_freq : 1
2022-11-13 00:47:49,405 - PoinTr - INFO - args.resume : False
2022-11-13 00:47:49,405 - PoinTr - INFO - args.test : True
2022-11-13 00:47:49,405 - PoinTr - INFO - args.mode : None
2022-11-13 00:47:49,405 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/PCN_models/test_example
2022-11-13 00:47:49,405 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/PCN_models/TFBoard/test_example
2022-11-13 00:47:49,405 - PoinTr - INFO - args.log_name : PoinTr
2022-11-13 00:47:49,405 - PoinTr - INFO - args.use_gpu : True
2022-11-13 00:47:49,405 - PoinTr - INFO - args.distributed : False
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0005
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-13 00:47:49,405 - PoinTr - INFO - config.scheduler = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_ = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.NAME : PCN
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.N_POINTS : 16384
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.CARS : False
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others.bs : 48
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_ = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.NAME : PCN
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.N_POINTS : 16384
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.CARS : False
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.test = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.test._base_ = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.NAME : PCN
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.N_POINTS : 16384
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.CARS : False
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.num_query : 224
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-13 00:47:49,408 - PoinTr - INFO - config.total_bs : 48
2022-11-13 00:47:49,408 - PoinTr - INFO - config.step_per_update : 1
2022-11-13 00:47:49,408 - PoinTr - INFO - config.max_epoch : 300
2022-11-13 00:47:49,408 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-13 00:47:49,409 - PoinTr - INFO - Distributed training: False
2022-11-13 00:47:49,409 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-13 00:47:49,409 - PoinTr - INFO - Tester start ...
2022-11-13 00:47:49,416 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-13 00:47:49,417 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-13 00:47:49,417 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-13 00:47:49,418 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-13 00:47:49,418 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-13 00:47:49,420 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-13 00:47:49,420 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-13 00:47:49,421 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-13 00:47:49,421 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 1200
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:566: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
2022-11-13 00:47:49,423 - MODEL - INFO - Transformer with knn_layer 1
2022-11-13 00:47:51,820 - PoinTr - INFO - Loading weights from ./pretrained/PoinTr_PCN.pth...
2022-11-13 00:47:54,342 - PoinTr - INFO - ckpts @ 289 epoch( performance = No Metrics)
------------Before transpose
<class 'torch.Tensor'> tensor([[ 0, 1, 2],
[ 3, 3, 3],
[ 4, 4, 4],
[ 5, 5, 5],
[ 6, 6, 6],
[ 7, 7, 7],
[ 8, 8, 8],
[ 9, 9, 9],
[10, 10, 10],
[11, 11, 11],
[12, 12, 12],
[13, 13, 13],
[14, 14, 14],
[15, 15, 15],
[ 2, 2, 1],
[ 1, 0, 0]], device='cuda:0')
-------------After transpose
<class 'torch.Tensor'> tensor([[ 0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 2, 1],
[ 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 2, 0],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 0]],
device='cuda:0')
Traceback (most recent call last):
File "main.py", line 68, in <module>
main()
File "main.py", line 62, in main
test_net(args, config)
File "/content/pointr/tools/runner.py", line 304, in test_net
test(base_model, test_dataloader, ChamferDisL1, ChamferDisL2, args, config, logger=logger)
File "/content/pointr/tools/runner.py", line 326, in test
ret = base_model(partial)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/content/pointr/models/PoinTr.py", line 92, in forward
q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/content/pointr/models/Transformer.py", line 353, in forward
coor, f = self.grouper(inpc.transpose(1,2).contiguous())
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/content/pointr/models/dgcnn_group.py", line 90, in forward
f = self.get_graph_feature(coor, f, coor, f)
File "/content/pointr/models/dgcnn_group.py", line 77, in get_graph_feature
feature = feature.view(batch_size, k, num_points_q, num_dims).permute(0, 3, 2, 1).contiguous()
RuntimeError: shape '[1, 16, 2048, 8]' is invalid for input of size 384
_, idx = knn(coor_k, coor_q) # bs k np
can you show me the shape of coor_k and coor_q? It seems that the shape of idx in your situation is (k, 3)? but it should be a 3-dims vector.
Thanks so much for your reply! The below is what I get when I do evaluation:
!bash ./scripts/test.sh 0 \
--ckpts ./pretrained/PoinTr_PCN.pth \
--config ./cfgs/PCN_models/PoinTr.yaml \
--exp_name example
Then I get following:
shpae of coor_k: torch.Size([1, 3, 2048])
shpae of coor_q: torch.Size([1, 3, 2048])
idx before transpose: tensor([[ 0, 1, 2],
[ 3, 3, 3],
[ 4, 4, 4],
[ 5, 5, 5],
[ 6, 6, 6],
[ 7, 7, 7],
[ 8, 8, 8],
[ 9, 9, 9],
[10, 10, 10],
[11, 11, 11],
[12, 12, 12],
[13, 13, 13],
[14, 14, 14],
[15, 15, 15],
[ 2, 2, 1],
[ 1, 0, 0]], device='cuda:0')
idx after transpose: tensor([[ 0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 2, 1],
[ 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 2, 0],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 0]],
device='cuda:0')
The below is what I do in tranning:
bash ./scripts/train.sh 0 \
--config ./cfgs/KITTI_models/PoinTr.yaml \
--exp_name example
The below I get:
shpae of coor_k: torch.Size([64, 3, 2048])
shpae of coor_q: torch.Size([64, 3, 2048])
idx before transpose: tensor([[ 0, 1, 2],
[15, 10, 4],
[11, 14, 8],
[14, 5, 13],
[10, 15, 10],
[13, 6, 15],
[ 4, 11, 11],
[ 6, 13, 6],
[ 5, 3, 12],
[ 7, 4, 5],
[ 3, 7, 14],
[12, 12, 3],
[ 9, 9, 7],
[ 8, 8, 9],
[ 1, 2, 1],
[ 2, 0, 0]], device='cuda:0')
idx after transpose: tensor([[ 0, 15, 11, 14, 10, 13, 4, 6, 5, 7, 3, 12, 9, 8, 1, 2],
[ 1, 10, 14, 5, 15, 6, 11, 13, 3, 4, 7, 12, 9, 8, 2, 0],
[ 2, 4, 8, 13, 10, 15, 11, 6, 12, 5, 14, 3, 7, 9, 1, 0]],
device='cuda:0')
One thing is weird,
For PCN_models, ShapeNet34_models, and ShapeNet55_models, they can work on PCN.yaml.
For GRNet.yaml, it will output a RuntimeError: CUDA out of memory.
However, they are both not working on PoinTr.yaml.
I alway get assert idx.shape[1] == k
I do not modify code.
These are works ;
!bash ./scripts/train.sh 0 \
--config ./cfgs/PCN_models/PCN.yaml \
--exp_name example
!bash ./scripts/train.sh 0 \
--config ./cfgs/ShapeNet55_models/PCN.yaml \
--exp_name example
These are not works and give error : RuntimeError: CUDA out of memory.
!bash ./scripts/train.sh 0 \
--config ./cfgs/PCN_models/GRNet.yaml \
--exp_name example
These are not works and give error : AssertionErro: rassert idx.shape[1] == k
I try PCN_models, ShapeNet34_models, and ShapeNet55_models, not works
!bash ./scripts/train.sh 0 \
--config ./cfgs/ShapeNet55_models/PoinTr.yaml \
--exp_name example
hi, the problem comes from knn_cuda
used in your environment.
Can you provide your env by running conda env list
.
And can you share with me your models/dgcnn_group.py
?
Thank u very much for your reply.
My environment is :
cuda: 11.2
pytorch:1.13.0+cu117
python: 3.7
gcc: 7.5
conda env list
> # conda environments:
> #
> base /usr/local
import torch
print(torch.__version__)
nvcc --version
gcc -v
> 1.13.0+cu117
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2021 NVIDIA Corporation
> Built on Sun_Feb_14_21:12:58_PST_2021
> Cuda compilation tools, release 11.2, V11.2.152
> Build cuda_11.2.r11.2/compiler.29618528_0
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
> OFFLOAD_TARGET_NAMES=nvptx-none
> OFFLOAD_TARGET_DEFAULT=1
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
The code I just fork from your repo.
https://github.com/Cmput-414/PoinTr/blob/master/models/dgcnn_group.py
You also can take a look at what I write in the colab.
https://github.com/Cmput-414/pointr-colab/blob/main/PoinTr.ipynb
Hi, i update the code for kNN calculation in dgcnn_group.py
, could you try again and update the results here?
If the error still exists, please debug and let me know the shape of input and output for kNN. (coor_k, coor_q, idx)
Best!
HI, I get this after running: shape of coor_q: torch.Size([48, 3, 2048]) shape of coor_k: torch.Size([48, 3, 2048]) shape of idx: Before transpose: torch.Size([48, 2048, 16]) After transpose: torch.Size([48, 16, 2048]) after veiw: torch.Size([1572864])
/content/pointr
+ GPUS=0
+ PY_ARGS='--config ./cfgs/PCN_models/PoinTr.yaml --exp_name example'
+ CUDA_VISIBLE_DEVICES=0
+ python main.py --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example
2022-11-21 07:18:16,859 - PoinTr - INFO - Copy the Config file from ./cfgs/PCN_models/PoinTr.yaml to ./experiments/PoinTr/PCN_models/example/config.yaml
2022-11-21 07:18:16,860 - PoinTr - INFO - args.config : ./cfgs/PCN_models/PoinTr.yaml
2022-11-21 07:18:16,860 - PoinTr - INFO - args.launcher : none
2022-11-21 07:18:16,860 - PoinTr - INFO - args.local_rank : 0
2022-11-21 07:18:16,860 - PoinTr - INFO - args.num_workers : 4
2022-11-21 07:18:16,860 - PoinTr - INFO - args.seed : 0
2022-11-21 07:18:16,860 - PoinTr - INFO - args.deterministic : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.sync_bn : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.exp_name : example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.start_ckpts : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.ckpts : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.val_freq : 1
2022-11-21 07:18:16,860 - PoinTr - INFO - args.resume : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.test : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.mode : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/PCN_models/example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/PCN_models/TFBoard/example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.log_name : PoinTr
2022-11-21 07:18:16,860 - PoinTr - INFO - args.use_gpu : True
2022-11-21 07:18:16,860 - PoinTr - INFO - args.distributed : False
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer = edict()
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0005
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-21 07:18:16,860 - PoinTr - INFO - config.scheduler = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_ = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.NAME : PCN
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.N_POINTS : 16384
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.CARS : False
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others.bs : 48
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_ = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_.NAME : PCN
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.N_POINTS : 16384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.CARS : False
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_ = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.NAME : PCN
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.N_POINTS : 16384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.CARS : False
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.num_query : 224
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.total_bs : 48
2022-11-21 07:18:16,862 - PoinTr - INFO - config.step_per_update : 1
2022-11-21 07:18:16,862 - PoinTr - INFO - config.max_epoch : 300
2022-11-21 07:18:16,862 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-21 07:18:16,863 - PoinTr - INFO - Distributed training: False
2022-11-21 07:18:16,863 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-21 07:18:16,871 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-21 07:18:16,978 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-21 07:18:16,985 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 07:18:17,013 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-21 07:18:17,041 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-21 07:18:17,051 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-21 07:18:17,066 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-21 07:18:17,096 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-21 07:18:17,105 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 28974
2022-11-21 07:18:17,112 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-21 07:18:17,113 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-21 07:18:17,113 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 1200
2022-11-21 07:18:17,117 - MODEL - INFO - Transformer with knn_layer 1
2022-11-21 07:18:19,483 - PoinTr - INFO - Using Data parallel ...
**************************************************
coor_k shape: torch.Size([48, 3, 2048])
coor_q shape: torch.Size([48, 3, 2048])
idx = knn_point(k, coor_k.transpose(-1, -2).contiguous(), coor_q.transpose(-1, -2).contiguous()) # B G M tensor([[[-5584463534953070592, -5620492331972034560, -5584463534944681984,
..., -5584463534936293376, -5584463534986625024,
-5620492331955257344],
[ 3003121664, -5584463534969847808, -5620492331955257344,
..., -5584463534944681984, -5584463534953070592,
-5620492331955257344],
[-5764607520039501824, -5584463534936293376, -5584463534944681984,
..., -5584463534944681984, -5584463534944681984,
0],
...,
[ 1183, 1184, 1185,
..., 1196, 1197,
1198],
[ 1183, 1184, 1185,
..., 1196, 1197,
1198],
[ 1183, 1184, 1185,
..., 1196, 1197,
1198]],
[[ 0, 0, 0,
..., 0, 0,
0],
[ 1, 0, 0,
..., 0, 0,
0],
[ 2, 0, 0,
..., 0, 0,
0],
...,
[ 528, 529, 530,
..., 541, 542,
543],
[ 528, 529, 530,
..., 541, 542,
543],
[ 528, 529, 530,
..., 541, 542,
543]],
[[ 0, 0, 0,
..., 0, 0,
0],
[ 1, 0, 0,
..., 0, 0,
0],
[ 2, 0, 0,
..., 0, 0,
0],
...,
[ 713, 714, 715,
..., 726, 727,
728],
[ 713, 714, 715,
..., 726, 727,
728],
[ 713, 714, 715,
..., 726, 727,
728]],
...,
[[ 0, 413, 429,
..., 1358, 1446,
807],
[ 1, 5, 80,
..., 1398, 1445,
918],
[ 2, 201, 404,
..., 1346, 1396,
515],
...,
[ 1461, 1462, 1463,
..., 1474, 1475,
1476],
[ 1461, 1462, 1463,
..., 1474, 1475,
1476],
[ 1461, 1462, 1463,
..., 1474, 1475,
1476]],
[[ 0, 30, 37,
..., 738, 743,
534],
[ 1, 15, 129,
..., 504, 532,
660],
[ 2, 76, 83,
..., 724, 761,
135],
...,
[ 798, 799, 800,
..., 811, 812,
813],
[ 798, 799, 800,
..., 811, 812,
813],
[ 798, 799, 800,
..., 811, 812,
813]],
[[ 0, 26, 198,
..., 491, 494,
286],
[ 1, 115, 186,
..., 541, 555,
452],
[ 2, 24, 124,
..., 555, 569,
409],
...,
[ 581, 582, 583,
..., 594, 595,
596],
[ 581, 582, 583,
..., 594, 595,
596],
[ 581, 582, 583,
..., 594, 595,
596]]], device='cuda:0')
idx: tensor([[[-5584463534953070592, -5620492331972034560, -5584463534944681984,
..., -5584463534936293376, -5584463534986625024,
-5620492331955257344],
[ 3003121664, -5584463534969847808, -5620492331955257344,
..., -5584463534944681984, -5584463534953070592,
-5620492331955257344],
[-5764607520039501824, -5584463534936293376, -5584463534944681984,
..., -5584463534944681984, -5584463534944681984,
0],
...,
[ 1183, 1184, 1185,
..., 1196, 1197,
1198],
[ 1183, 1184, 1185,
..., 1196, 1197,
1198],
[ 1183, 1184, 1185,
..., 1196, 1197,
1198]],
[[ 0, 0, 0,
..., 0, 0,
0],
[ 1, 0, 0,
..., 0, 0,
0],
[ 2, 0, 0,
..., 0, 0,
0],
...,
[ 528, 529, 530,
..., 541, 542,
543],
[ 528, 529, 530,
..., 541, 542,
543],
[ 528, 529, 530,
..., 541, 542,
543]],
[[ 0, 0, 0,
..., 0, 0,
0],
[ 1, 0, 0,
..., 0, 0,
0],
[ 2, 0, 0,
..., 0, 0,
0],
...,
[ 713, 714, 715,
..., 726, 727,
728],
[ 713, 714, 715,
..., 726, 727,
728],
[ 713, 714, 715,
..., 726, 727,
728]],
...,
[[ 0, 413, 429,
..., 1358, 1446,
807],
[ 1, 5, 80,
..., 1398, 1445,
918],
[ 2, 201, 404,
..., 1346, 1396,
515],
...,
[ 1461, 1462, 1463,
..., 1474, 1475,
1476],
[ 1461, 1462, 1463,
..., 1474, 1475,
1476],
[ 1461, 1462, 1463,
..., 1474, 1475,
1476]],
[[ 0, 30, 37,
..., 738, 743,
534],
[ 1, 15, 129,
..., 504, 532,
660],
[ 2, 76, 83,
..., 724, 761,
135],
...,
[ 798, 799, 800,
..., 811, 812,
813],
[ 798, 799, 800,
..., 811, 812,
813],
[ 798, 799, 800,
..., 811, 812,
813]],
[[ 0, 26, 198,
..., 491, 494,
286],
[ 1, 115, 186,
..., 541, 555,
452],
[ 2, 24, 124,
..., 555, 569,
409],
...,
[ 581, 582, 583,
..., 594, 595,
596],
[ 581, 582, 583,
..., 594, 595,
596],
[ 581, 582, 583,
..., 594, 595,
596]]], device='cuda:0')
idx = idx.transpose(-1, -2).contiguous()
idx: tensor([[[-5584463534953070592, 3003121664, -5764607520039501824,
..., 1183, 1183,
1183],
[-5620492331972034560, -5584463534969847808, -5584463534936293376,
..., 1184, 1184,
1184],
[-5584463534944681984, -5620492331955257344, -5584463534944681984,
..., 1185, 1185,
1185],
...,
[-5584463534936293376, -5584463534944681984, -5584463534944681984,
..., 1196, 1196,
1196],
[-5584463534986625024, -5584463534953070592, -5584463534944681984,
..., 1197, 1197,
1197],
[-5620492331955257344, -5620492331955257344, 0,
..., 1198, 1198,
1198]],
[[ 0, 1, 2,
..., 528, 528,
528],
[ 0, 0, 0,
..., 529, 529,
529],
[ 0, 0, 0,
..., 530, 530,
530],
...,
[ 0, 0, 0,
..., 541, 541,
541],
[ 0, 0, 0,
..., 542, 542,
542],
[ 0, 0, 0,
..., 543, 543,
543]],
[[ 0, 1, 2,
..., 713, 713,
713],
[ 0, 0, 0,
..., 714, 714,
714],
[ 0, 0, 0,
..., 715, 715,
715],
...,
[ 0, 0, 0,
..., 726, 726,
726],
[ 0, 0, 0,
..., 727, 727,
727],
[ 0, 0, 0,
..., 728, 728,
728]],
...,
[[ 0, 1, 2,
..., 1461, 1461,
1461],
[ 413, 5, 201,
..., 1462, 1462,
1462],
[ 429, 80, 404,
..., 1463, 1463,
1463],
...,
[ 1358, 1398, 1346,
..., 1474, 1474,
1474],
[ 1446, 1445, 1396,
..., 1475, 1475,
1475],
[ 807, 918, 515,
..., 1476, 1476,
1476]],
[[ 0, 1, 2,
..., 798, 798,
798],
[ 30, 15, 76,
..., 799, 799,
799],
[ 37, 129, 83,
..., 800, 800,
800],
...,
[ 738, 504, 724,
..., 811, 811,
811],
[ 743, 532, 761,
..., 812, 812,
812],
[ 534, 660, 135,
..., 813, 813,
813]],
[[ 0, 1, 2,
..., 581, 581,
581],
[ 26, 115, 24,
..., 582, 582,
582],
[ 198, 186, 124,
..., 583, 583,
583],
...,
[ 491, 541, 555,
..., 594, 594,
594],
[ 494, 555, 569,
..., 595, 595,
595],
[ 286, 452, 409,
..., 596, 596,
596]]], device='cuda:0')
idx_base: tensor([[[ 0]],
[[ 2048]],
[[ 4096]],
[[ 6144]],
[[ 8192]],
[[10240]],
[[12288]],
[[14336]],
[[16384]],
[[18432]],
[[20480]],
[[22528]],
[[24576]],
[[26624]],
[[28672]],
[[30720]],
[[32768]],
[[34816]],
[[36864]],
[[38912]],
[[40960]],
[[43008]],
[[45056]],
[[47104]],
[[49152]],
[[51200]],
[[53248]],
[[55296]],
[[57344]],
[[59392]],
[[61440]],
[[63488]],
[[65536]],
[[67584]],
[[69632]],
[[71680]],
[[73728]],
[[75776]],
[[77824]],
[[79872]],
[[81920]],
[[83968]],
[[86016]],
[[88064]],
[[90112]],
[[92160]],
[[94208]],
[[96256]]], device='cuda:0')
idx = idx + idx_base
idx: tensor([[[-5584463534953070592, 3003121664, -5764607520039501824,
..., 1183, 1183,
1183],
[-5620492331972034560, -5584463534969847808, -5584463534936293376,
..., 1184, 1184,
1184],
[-5584463534944681984, -5620492331955257344, -5584463534944681984,
..., 1185, 1185,
1185],
...,
[-5584463534936293376, -5584463534944681984, -5584463534944681984,
..., 1196, 1196,
1196],
[-5584463534986625024, -5584463534953070592, -5584463534944681984,
..., 1197, 1197,
1197],
[-5620492331955257344, -5620492331955257344, 0,
..., 1198, 1198,
1198]],
[[ 2048, 2049, 2050,
..., 2576, 2576,
2576],
[ 2048, 2048, 2048,
..., 2577, 2577,
2577],
[ 2048, 2048, 2048,
..., 2578, 2578,
2578],
...,
[ 2048, 2048, 2048,
..., 2589, 2589,
2589],
[ 2048, 2048, 2048,
..., 2590, 2590,
2590],
[ 2048, 2048, 2048,
..., 2591, 2591,
2591]],
[[ 4096, 4097, 4098,
..., 4809, 4809,
4809],
[ 4096, 4096, 4096,
..., 4810, 4810,
4810],
[ 4096, 4096, 4096,
..., 4811, 4811,
4811],
...,
[ 4096, 4096, 4096,
..., 4822, 4822,
4822],
[ 4096, 4096, 4096,
..., 4823, 4823,
4823],
[ 4096, 4096, 4096,
..., 4824, 4824,
4824]],
...,
[[ 92160, 92161, 92162,
..., 93621, 93621,
93621],
[ 92573, 92165, 92361,
..., 93622, 93622,
93622],
[ 92589, 92240, 92564,
..., 93623, 93623,
93623],
...,
[ 93518, 93558, 93506,
..., 93634, 93634,
93634],
[ 93606, 93605, 93556,
..., 93635, 93635,
93635],
[ 92967, 93078, 92675,
..., 93636, 93636,
93636]],
[[ 94208, 94209, 94210,
..., 95006, 95006,
95006],
[ 94238, 94223, 94284,
..., 95007, 95007,
95007],
[ 94245, 94337, 94291,
..., 95008, 95008,
95008],
...,
[ 94946, 94712, 94932,
..., 95019, 95019,
95019],
[ 94951, 94740, 94969,
..., 95020, 95020,
95020],
[ 94742, 94868, 94343,
..., 95021, 95021,
95021]],
[[ 96256, 96257, 96258,
..., 96837, 96837,
96837],
[ 96282, 96371, 96280,
..., 96838, 96838,
96838],
[ 96454, 96442, 96380,
..., 96839, 96839,
96839],
...,
[ 96747, 96797, 96811,
..., 96850, 96850,
96850],
[ 96750, 96811, 96825,
..., 96851, 96851,
96851],
[ 96542, 96708, 96665,
..., 96852, 96852,
96852]]], device='cuda:0')
idx = idx.view(-1)
idx: tensor([-5584463534953070592, 3003121664, -5764607520039501824,
..., 96852, 96852,
96852], device='cuda:0')
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [3,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [3,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed
*****some repeat lines*****
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [97,0,0], thread: [111,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
File "main.py", line 68, in <module>
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/content/pointr/tools/runner.py", line 98, in run_net
ret = base_model(partial)
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/content/pointr/models/PoinTr.py", line 92, in forward
q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/content/pointr/models/Transformer.py", line 353, in forward
coor, f = self.grouper(inpc.transpose(1,2).contiguous())
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/content/pointr/models/dgcnn_group.py", line 137, in forward
f = self.layer1(f)
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 460, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([48, 16, 2048, 16], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(16, 32, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
memory_format = Contiguous
data_type = CUDNN_DATA_FLOAT
padding = [0, 0, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x5593baac81c0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 48, 16, 2048, 16,
strideA = 524288, 32768, 16, 1,
output: TensorDescriptor 0x559435ebfd40
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 48, 32, 2048, 16,
strideA = 1048576, 32768, 16, 1,
weight: FilterDescriptor 0x559433965ce0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 32, 16, 1, 1,
Pointer addresses:
input: 0x7f1e38c00000
output: 0x7f1e3ec00000
weight: 0x7f1ecf000600
Thank u so much!
Hi, it seems the last issue ( Unexpected shape of idx
) has gone.
The new error comes from the negative idx
(idx: tensor([[[-5584463534953070592, 3003121664, -5764607520039501824] ...)
However, i am not sure why this error occurred. idx
is produced by torch.topk
function. (https://github.com/yuxumin/PoinTr/blob/master/models/dgcnn_group.py#L17)
Could you provide more information?
sure, this is waht i have in colab: https://colab.research.google.com/drive/1Utvtn0euJwu350eOctS8IQt1yTegU5PF#scrollTo=gTQpfhF7Lm43
Hi, a permission is required to visit this colab.
Hi, I try use a local laptop to implement the code. Looks assertion error is gone, but new problem!!!
bash ./scripts/train.sh 0 \
--config ./cfgs/PCN_models/PoinTr.yaml \
--exp_name example
s.deterministic : False
2022-11-21 03:37:27,785 - PoinTr - INFO - args.sync_bn : False
2022-11-21 03:37:27,786 - PoinTr - INFO - args.exp_name : example
2022-11-21 03:37:27,787 - PoinTr - INFO - args.start_ckpts : None
2022-11-21 03:37:27,788 - PoinTr - INFO - args.ckpts : None
2022-11-21 03:37:27,789 - PoinTr - INFO - args.val_freq : 1
2022-11-21 03:37:27,790 - PoinTr - INFO - args.resume : False
2022-11-21 03:37:27,790 - PoinTr - INFO - args.test : False
2022-11-21 03:37:27,792 - PoinTr - INFO - args.mode : None
2022-11-21 03:37:27,794 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/KITTI_models/example
2022-11-21 03:37:27,795 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/KITTI_models/TFBoard/example
2022-11-21 03:37:27,796 - PoinTr - INFO - args.log_name : PoinTr
2022-11-21 03:37:27,796 - PoinTr - INFO - args.use_gpu : True
2022-11-21 03:37:27,797 - PoinTr - INFO - args.distributed : False
2022-11-21 03:37:27,798 - PoinTr - INFO - config.optimizer = edict()
2022-11-21 03:37:27,800 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-21 03:37:27,801 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-21 03:37:27,802 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0001
2022-11-21 03:37:27,803 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-21 03:37:27,804 - PoinTr - INFO - config.scheduler = edict()
2022-11-21 03:37:27,805 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-21 03:37:27,806 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-21 03:37:27,810 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-21 03:37:27,812 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-21 03:37:27,814 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-21 03:37:27,817 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-21 03:37:27,818 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-21 03:37:27,819 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-21 03:37:27,819 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-21 03:37:27,820 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-21 03:37:27,820 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-21 03:37:27,821 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-21 03:37:27,821 - PoinTr - INFO - config.dataset = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train.base = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train.base.NAME : PCN
2022-11-21 03:37:27,823 - PoinTr - INFO - config.dataset.train.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 03:37:27,823 - PoinTr - INFO - config.dataset.train.base.N_POINTS : 16384
2022-11-21 03:37:27,824 - PoinTr - INFO - config.dataset.train.base.N_RENDERINGS : 8
2022-11-21 03:37:27,824 - PoinTr - INFO - config.dataset.train.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 03:37:27,825 - PoinTr - INFO - config.dataset.train.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 03:37:27,827 - PoinTr - INFO - config.dataset.train.base.CARS : True
2022-11-21 03:37:27,827 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-21 03:37:27,828 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-21 03:37:27,829 - PoinTr - INFO - config.dataset.train.others.bs : 64
2022-11-21 03:37:27,830 - PoinTr - INFO - config.dataset.val = edict()
2022-11-21 03:37:27,831 - PoinTr - INFO - config.dataset.val.base = edict()
2022-11-21 03:37:27,832 - PoinTr - INFO - config.dataset.val.base.NAME : PCN
2022-11-21 03:37:27,833 - PoinTr - INFO - config.dataset.val.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 03:37:27,834 - PoinTr - INFO - config.dataset.val.base.N_POINTS : 16384
2022-11-21 03:37:27,836 - PoinTr - INFO - config.dataset.val.base.N_RENDERINGS : 8
2022-11-21 03:37:27,836 - PoinTr - INFO - config.dataset.val.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 03:37:27,837 - PoinTr - INFO - config.dataset.val.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 03:37:27,838 - PoinTr - INFO - config.dataset.val.base.CARS : True
2022-11-21 03:37:27,839 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-21 03:37:27,839 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-21 03:37:27,840 - PoinTr - INFO - config.dataset.test = edict()
2022-11-21 03:37:27,841 - PoinTr - INFO - config.dataset.test.base = edict()
2022-11-21 03:37:27,842 - PoinTr - INFO - config.dataset.test.base.NAME : KITTI
2022-11-21 03:37:27,842 - PoinTr - INFO - config.dataset.test.base.CATEGORY_FILE_PATH : data/KITTI/KITTI.json
2022-11-21 03:37:27,844 - PoinTr - INFO - config.dataset.test.base.N_POINTS : 16384
2022-11-21 03:37:27,845 - PoinTr - INFO - config.dataset.test.base.CLOUD_PATH : data/KITTI/cars/%s.pcd
2022-11-21 03:37:27,848 - PoinTr - INFO - config.dataset.test.base.BBOX_PATH : data/KITTI/bboxes/%s.txt
2022-11-21 03:37:27,854 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-21 03:37:27,855 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-21 03:37:27,858 - PoinTr - INFO - config.model = edict()
2022-11-21 03:37:27,863 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-21 03:37:27,865 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-21 03:37:27,866 - PoinTr - INFO - config.model.num_query : 224
2022-11-21 03:37:27,867 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-21 03:37:27,868 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-21 03:37:27,869 - PoinTr - INFO - config.total_bs : 64
2022-11-21 03:37:27,870 - PoinTr - INFO - config.step_per_update : 1
2022-11-21 03:37:27,870 - PoinTr - INFO - config.max_epoch : 600
2022-11-21 03:37:27,871 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-21 03:37:27,872 - PoinTr - INFO - Distributed training: False
2022-11-21 03:37:27,872 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-21 03:37:27,958 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 03:37:28,078 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 5677
2022-11-21 03:37:28,176 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 03:37:28,177 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 150
2022-11-21 03:37:28,178 - MODEL - INFO - Transformer with knn_layer 1
2022-11-21 03:38:30,817 - PoinTr - INFO - Using Data parallel ...
Format = auto
Extension = pcd
Format = auto
Extension = pcd
Format = auto
*a lot of repeat ****
Format = auto
Extension = pcd
Format = auto
Extension = pcd
Format = auto
Traceback (most recent call last):
File "main.py", line 68, in
File "/mnt/f/PoinTr/models/Transformer.py", line 19, in get_knnindex , idx = knn(coor_k, coor_q) # bs k np
_, idx = knn(coor_k.contiguous(), coor_q.contiguous()) # bs k np
SORRY to bother you again!
Can I know what kind of environment you use? For Cuda, tensor, TensorFlow, GCC, Python...
I even think mainly problem is made by the environment.
After I modified it, I got below:
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
ret = base_model(partial)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/PoinTr.py", line 92, in forward
q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/Transformer.py", line 366, in forward
x = blk(x + pos, knn_index) # B N C
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/Transformer.py", line 217, in forward
knn_f = get_graph_feature(norm_x, knn_index)
File "/mnt/f/PoinTr/models/Transformer.py", line 33, in get_graph_feature
feature = feature.view(batch_size, k, num_query, num_dims)
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368
so, what's the shape of 'knn_index' in 'https://github.com/yuxumin/PoinTr/blob/master/models/Transformer.py#L32' in your code. Can you make sure you are in the right way to inference the code? (right model on the corresponding dataset)
This is what I use in the code: https://github.com/Cmput-414/PoinTr/tree/change
My environment:
Cuda 10.1,
Torch 1.9.0+cu102,
torchaudio 0.9.0,
torchvision 0.10.0+cu102,
GCC 9.4
python 3.8.10
bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example
knn_index_shape: torch.Size([1152]) knn_index: tensor([ 0, 1, 2, ..., 6018, 6018, 6017], device='cuda:0')
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368
bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.47 GiB already allocated; 1020.84 MiB free; 3.48 GiB reserved in total by PyTorch)
bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/PCN.yaml --exp_name example
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 2.74 GiB already allocated; 1.55 GiB free; 2.93 GiB reserved in total by PyTorch)
bash ./scripts/train.sh 0 --config ./cfgs/KITTI_models/PoinTr.yaml --exp_name example
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 1.18 GiB already allocated; 3.28 GiB free; 1.20 GiB reserved in total by PyTorch)
bash ./scripts/train.sh 0 --config ./cfgs/KITTI_models/PCN.yaml --exp_name example
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 6.00 GiB total capacity; 481.80 MiB already allocated; 3.82 GiB free; 682.00 MiB reserved in total by PyTorch)
bash ./scripts/train.sh 0 --config ./cfgs/ShapeNet55_models/PCN.yaml --exp_name example
RuntimeError: CUDA out of memory. Tried to allocate 1.01 GiB (GPU 0; 6.00 GiB total capacity; 2.22 GiB already allocated; 1.95 GiB free; 2.53 GiB reserved in total by PyTorch)bash ./scripts/train.sh 0 --config ./cfgs/ShapeNet55_models/PoinTr.yaml --exp_name example
RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 6.00 GiB total capacity; 2.38 GiB already allocated; 1.29 GiB free; 3.19 GiB reserved in total by PyTorch)
1. main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
ret = base_model(partial)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/PoinTr.py", line 92, in forward
q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/Transformer.py", line 367, in forward
x = blk(x + pos, knn_index) # B N C
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/Transformer.py", line 218, in forward
knn_f = get_graph_feature(norm_x, knn_index)
File "/mnt/f/PoinTr/models/Transformer.py", line 34, in get_graph_feature
feature = feature.view(batch_size, k, num_query, num_dims)
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368
2. Traceback (most recent call last):
File "main.py", line 68, in <module>
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
ret = base_model(partial)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/GRNet.py", line 141, in forward
pt_features_32_l = self.conv1(pt_features_64_l)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 240, in forward
return F.max_pool3d(input, self.kernel_size, self.stride,
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/_jit_internal.py", line 405, in fn
return if_false(*args, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 784, in _max_pool3d
return torch.max_pool3d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.47 GiB already allocated; 1020.84 MiB free; 3.48 GiB reserved in total by PyTorch)
3. File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
ret = base_model(partial)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/PCN.py", line 76, in forward
fine = self.final_conv(feat) + point_feat # B 3 N
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 298, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 294, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 2.74 GiB already allocated; 1.55 GiB free; 2.93 GiB reserved in total by PyTorch)
Sorry that i am not familiar with Google Colab, and can not run the code in your colab.
knn_index_shape: torch.Size([1152]) knn_index: tensor([ 0, 1, 2, ..., 6018, 6018, 6017], device='cuda:0') RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368
knn_index should be (bs k np), in the origin setting for PCN dataset, k = 8
and np = 224
.
I think the error may due to the knn_cuda in your env.
I update a pytorch-based knn algorithm, could you can try the new code?
RuntimeError: CUDA out of memory.
For OOM problem, i think you can reduce the batchsize (just modify the yaml file)
THANK U SO MUCH!: When I change the batch size to 2, it is running!!! For now: I followed by this Then I also change the code that u just modified. Yeah, the main thing is SET ENVIRONMENT. This confused me a lot. But, it is solved, and I can start to learn the code. Thanks again!!!! You are really nice!!!
@jackie174, Congrats!
Hello Xumin, I got this problem, any suggestions? bash ./scripts/train.sh 0 \ --config ./cfgs/KITTI_models/PoinTr.yaml \ --exp_name example /content/pointr /content/pointr