mikacuy / pointnetvlad

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018
MIT License
351 stars 72 forks source link

what the size of RAM did you use? #10

Closed roycyh closed 4 years ago

roycyh commented 4 years ago

Hi, i was running the code using a computer with 16GB memory and NVIDIA GeForce GTX2070. However, the memory was filled when I run the generate_training_tuples_baselineand.py & evaluate.py. The process was killed by the system. It seems that the size of the memory is not enough. Therefore, I would like to know how large the RAM did you use?

mikacuy commented 4 years ago

Hi,

I was using a 1080Ti and had 64GB of RAM if I remember correctly.

roycyh commented 4 years ago

Hi,

I was using a 1080Ti and had 64GB of RAM if I remember correctly.

Hi, Thank you for your reply. At your suggestion, I have upgrade my computer with 70GB memory and GTX2070. However, the memory still be filled when I run the evaluate.py and the process was killed with a message like "[1] 3552 killed python evaluate.py". I would like to know whether you faced a similar situation before. thanks.

roycyh commented 4 years ago

For your reference, this is the result which shows the interruption of the process. (pnv) ➜ pointnetvlad git:(master) python evaluate.py /home/cyh/anaconda3/envs/pnv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:469: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/cyh/anaconda3/envs/pnv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:470: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/cyh/anaconda3/envs/pnv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:471: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/cyh/anaconda3/envs/pnv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:472: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/cyh/anaconda3/envs/pnv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:473: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/cyh/anaconda3/envs/pnv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:476: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) /home/cyh/Documents/PNV/pointnetvlad/../benchmark_datasets/ Trajectories Loaded. Trajectories Loaded. In Graph Tensor("Placeholder_4:0", shape=(), dtype=bool, device=/device:GPU:0) Tensor("query_triplets/concat:0", shape=(3, 17, 4096, 3), dtype=float32, device=/device:GPU:0) Tensor("query_triplets/Mul_1:0", shape=(51, 256), dtype=float32, device=/device:GPU:0) Tensor("query_triplets/Reshape_5:0", shape=(3, 17, 256), dtype=float32, device=/device:GPU:0) Tensor("query_triplets/split:0", shape=(3, 1, 256), dtype=float32, device=/device:GPU:0) Tensor("query_triplets/split:1", shape=(3, 4, 256), dtype=float32, device=/device:GPU:0) Tensor("query_triplets/split:2", shape=(3, 12, 256), dtype=float32, device=/device:GPU:0) 2020-01-09 19:44:52.781646: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2020-01-09 19:44:52.934509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-01-09 19:44:52.934922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.785 pciBusID: 0000:01:00.0 totalMemory: 7.79GiB freeMemory: 7.41GiB 2020-01-09 19:44:52.934945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5) Model restored. [1] 3552 killed python evaluate.py

Following is the environment I am using : `(pnv) ➜ pointnetvlad git:(master) conda list

packages in environment at /home/cyh/anaconda3/envs/pnv:

#

Name Version Build Channel

_libgcc_mutex 0.1 main
blas 1.0 mkl anaconda bleach 1.5.0 py36_0 anaconda ca-certificates 2019.11.27 0
certifi 2019.11.28 py36_0
cffi 1.13.2 py36h2e261b9_0
cudatoolkit 8.0 3 anaconda cudnn 7.0.5 cuda8.0_0
freetype 2.9.1 h8a8886c_1
html5lib 0.9999999 py36_0 anaconda intel-openmp 2019.5 281 anaconda joblib 0.14.0 pypi_0 pypi jpeg 9b h024ee3a_2
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0 anaconda libpng 1.6.37 hbc83047_0
libprotobuf 3.9.2 hd408876_0 anaconda libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_0
markdown 3.1.1 py36_0 anaconda mkl 2019.5 281 anaconda mkl-service 2.3.0 py36he904b0f_0 anaconda mkl_fft 1.0.14 py36ha843d7b_0 anaconda mkl_random 1.1.0 py36hd6b4f25_0 anaconda ncurses 6.1 he6710b0_1
ninja 1.9.0 py36hfd86e86_0
numpy 1.17.2 py36haad9e8e_0 anaconda numpy-base 1.17.2 py36hde5b4d6_0 anaconda olefile 0.46 py36_0
openssl 1.1.1d h7b6447c_3
pandas 0.25.1 pypi_0 pypi pillow 6.2.1 py36h34e0f95_0
pip 19.2.3 py36_0 anaconda protobuf 3.9.2 py36he6710b0_0 anaconda pycparser 2.19 py36_0
python 3.6.9 h265db76_0 anaconda python-dateutil 2.8.0 pypi_0 pypi pytorch 1.0.1 py3.6_cuda8.0.61_cudnn7.1.2_2 pytorch pytz 2019.3 pypi_0 pypi readline 7.0 h7b6447c_5
scikit-learn 0.21.3 pypi_0 pypi scipy 1.3.1 pypi_0 pypi setuptools 41.4.0 py36_0 anaconda six 1.12.0 py36_0 anaconda sklearn 0.0 pypi_0 pypi sqlite 3.30.0 h7b6447c_0
tensorflow-gpu 1.4.1 0 anaconda tensorflow-gpu-base 1.4.1 py36h01caf0a_0 anaconda tensorflow-tensorboard 1.5.1 py36hf484d3e_1 anaconda tk 8.6.8 hbc83047_0
torchvision 0.2.2 py_3 pytorch werkzeug 0.16.0 py_0 anaconda wheel 0.33.6 py36_0 anaconda xz 5.2.4 h14c3975_4
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
`

mikacuy commented 4 years ago

Hi,

Sorry but I have not encountered this problem before. Can you confirm that it is a memory issue?

Mika

roycyh commented 4 years ago

Hi,

Sorry but I have not encountered this problem before. Can you confirm that it is a memory issue?

Mika

Thanks. I have solved the problem by reset the environment. However, I am not sure which module went wrong. For the reference for the people who may meet similar situation in the future. It may not be the problem of memory. The max occupancy of memory on my device is around 5 to 6 GB. Therefore, before you make decision to upgrade your device, please try to reset the environment.

Roy