[Bug] Waymo dataset conversion

eezhang123 commented 9 months ago

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

1.x branch https://github.com/open-mmlab/mmdetection3d/tree/dev-1.x

Environment

sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.3, V11.3.109 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.12.0 PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.6
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=com pute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inl ines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 - fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno- unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redunda nt-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=ca st-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFL AGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.0 OpenCV: 4.9.0 MMEngine: 0.10.2 MMDetection: 3.2.0 MMDetection3D: 1.3.0+8deeb6e spconv2.0: False

Reproduces the problem - code sample

pip install -r requirements/optional.txt

Reproduces the problem - command or script

pip install -r requirements/optional.txt

Reproduces the problem - error message

Installing collected packages: wrapt, typing-extensions, termcolor, flatbuffers, clang, appdirs, typed-ast, toml, tensorflow-estimator, six, regex, numpy, keras, gast, opt-einsum, keras-preprocessing, h5py, google-pasta, black, astunparse, absl-py, scikit-image, tensorflow, waymo-open-dataset-tf-2-6-0 Attempting uninstall: typing-extensions Found existing installation: typing_extensions 4.9.0 Uninstalling typing_extensions-4.9.0: Successfully uninstalled typing_extensions-4.9.0 Attempting uninstall: termcolor Found existing installation: termcolor 2.4.0 Uninstalling termcolor-2.4.0: Successfully uninstalled termcolor-2.4.0 Attempting uninstall: six Found existing installation: six 1.16.0 Uninstalling six-1.16.0: Successfully uninstalled six-1.16.0 Attempting uninstall: numpy Found existing installation: numpy 1.24.3 Uninstalling numpy-1.24.3: Successfully uninstalled numpy-1.24.3 Attempting uninstall: black Found existing installation: black 23.12.1 Uninstalling black-23.12.1: Successfully uninstalled black-23.12.1 Attempting uninstall: absl-py Found existing installation: absl-py 2.0.0 Uninstalling absl-py-2.0.0: Successfully uninstalled absl-py-2.0.0 Attempting uninstall: scikit-image Found existing installation: scikit-image 0.21.0 Uninstalling scikit-image-0.21.0: Successfully uninstalled scikit-image-0.21.0 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. dash 2.14.2 requires typing-extensions>=4.1.1, but you have typing-extensions 3.7.4.3 which is incompatible. numba 0.58.1 requires numpy<1.27,>=1.22, but you have numpy 1.19.5 which is incompatible. nuscenes-devkit 1.1.11 requires numpy>=1.22.0, but you have numpy 1.19.5 which is incompatible. pandas 2.0.3 requires numpy>=1.20.3; python_version < "3.10", but you have numpy 1.19.5 which is incompatible. rich 13.4.2 requires typing-extensions<5.0,>=4.0.0; python_version < "3.9", but you have typing-extensions 3.7.4.3 which is incompatible. Successfully installed absl-py-0.15.0 appdirs-1.4.4 astunparse-1.6.3 black-20.8b1 clang-5.0 flatbuffers-1.12 gast-0.4.0 google-pasta-0.2.0 h5py-3.1.0 keras-2.15.0 keras-preprocessing-1.1.2 numpy-1.19.5 opt-einsum-3.3.0 regex-2023.12.25 scikit-image-0.19.3 six-1.15.0 tensorflow-2.6.0 tensorflow-estimator-2.15.0 termcolor-1.1.0 toml-0.10.2 typed-ast-1.5.5 typing-extensions-3.7.4.3 waymo-open-dataset-tf-2-6-0-1.4.9 wrapt-1.12.1

when i run the ''python mmdet3d/utils/collect_env.py''

ImportError: Numba needs NumPy 1.22 or greater. Got NumPy 1.19.

when i install the numpy >=1.22 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. [6/1981] tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.24.4 which is incompatible.

Additional information

I follow the https://mmdetection3d.readthedocs.io/en/latest/get_started.html step by step， but when i run the ‘’pip install -r requirements/optional.txt‘’， waymo-open-dataset-tf-2-6-0-1.4.9 uninstall the numpy 1.24 and install the numpy1.19, also other package has the same problem. The environment goes wrong

I want to convert the waymo dataset to the kitti format, thks for your reply.

bigsheep2018 commented 9 months ago

@eezhang123 I manually upgrade the numpy to 1.22, and encounter the issue #1233 MMLab Team, Please update the waymo dataset preparation docs/docker for setup mmdetection3d development environments. @Xiangxu-0103

eezhang123 commented 9 months ago

@eezhang123 I manually upgrade the numpy to 1.22, and encounter the issue #1233 MMLab Team, Please update the waymo dataset preparation docs/docker for setup mmdetection3d development environments. @Xiangxu-0103

It seems waymo-open-dataset-tf-2-6-0 doesn't match mmdet3d enviroment. i agree with you, MMLab Team need update docs or docker for setup mmdet3d enviroment .

sunjiahao1999 commented 9 months ago

@eezhang123 waymo-open-dataset-tf-2-6-0 is less compatible, so we put it in option.txt you can reinstall numpy==1.23.0 after installing waymo-open-dataset-tf-2-6-0.

sunjiahao1999 commented 9 months ago

@bigsheep2018 Is the training and validation infos generated properly? And also make sure you have the test dataset under your data/waymo/waymo_format.

eezhang123 commented 9 months ago

@eezhang123 waymo-open-dataset-tf-2-6-0 is less compatible, so we put it in option.txt you can reinstall numpy==1.23.0 after installing waymo-open-dataset-tf-2-6-0.

I try you method again update the numpy to 1.23.0, but the waymo-open-dataset-tf-2-6-0 only support numpy==1.19.2

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.23.0 which is incompatible.

sunjiahao1999 commented 9 months ago

Yes, requirements/optional.txt. In my case, I reinstall numpy==1.23.0 and mmdet3d can work properly. But, it doesn't necessarily apply to everyone. You can omit this warning tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.23.0 which is incompatible.. It will not block you from generating infos files and ground truth database.

bigsheep2018 commented 9 months ago

hello @eezhang123 @sunjiahao1999 , There is no info(*.pkl) generated as follows

The program get stuck after 2~3 hrs like follows:

There is error repeatedly displayed at beginning: failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error I am not sure if it maters as the program is still running like follows

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-04 03:45:09.784422: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-01-04 03:45:09.784602: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 6a95689e6a5f
2024-01-04 03:45:09.784622: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 6a95689e6a5f
2024-01-04 03:45:09.784961: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 470.57.2
2024-01-04 03:45:09.785034: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.57.2
2024-01-04 03:45:09.785048: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 470.57.2
2024-01-04 03:45:09.786640: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-04 03:45:09.793101: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-01-04 03:45:09.793208: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 6a95689e6a5f
2024-01-04 03:45:09.793288: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 6a95689e6a5f
2024-01-04 03:45:09.793562: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 470.57.2
2024-01-04 03:45:09.793618: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.57.2
2024-01-04 03:45:09.793631: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 470.57.2
2024-01-04 03:45:09.795724: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA

The waymo raw data is organized as follows: (each folder contains several extracted .tfrecords and .rar raw files downloaded from waymo official site)

eezhang123 commented 9 months ago

Yes, requirements/optional.txt. In my case, I reinstall numpy==1.23.0 and mmdet3d can work properly. But, it doesn't necessarily apply to everyone. You can omit this warning tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.23.0 which is incompatible.. It will not block you from generating infos files and ground truth database.

yes, your suggestion is right. But I meet a new error: python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 8 --extra-tag waymo

when the workers ==8, the error message as follows:

2024-01-05 03:09:16.194942: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2024-01-05 03:09:16.243395: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2024-01-05 03:09:16.243476: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at matmul_op_impl.h:451 : Internal: Attempting to perform BLAS operation using Str eamExecutor without BLAS support 2024-01-05 03:09:16.245937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6 MB memory: -> device: 0 , name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:0e:00.0, compute capability: 8.0 2024-01-05 03:09:16.249905: E tensorflow/stream_executor/cuda/cuda_driver.cc:692] could not allocate CUDA stream for context 0x98adee0: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2024-01-05 03:09:16.249931: I tensorflow/stream_executor/stream_executor_pimpl.cc:780] failed to allocate stream; live stream count: 3 2024-01-05 03:09:16.249941: E tensorflow/stream_executor/stream.cc:310] failed to allocate stream during initialization

when the workers == 2, the error message as follows:

2024-01-05 03:16:12.980419: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2024-01-05 03:16:13.203962: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2024-01-05 03:16:13.679389: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2024-01-05 03:16:13.679493: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at matmul_op_impl.h:451 : Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support 2024-01-05 03:16:13.685689: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2024-01-05 03:16:13.685784: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at matmul_op_impl.h:451 : Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support

File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/waymo_open_dataset/utils/transform_utils.py", line 126, in get_rotation_matrix return tf.matmul(r_yaw, tf.matmul(r_pitch, r_roll)) File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper return target(*args, **kwargs) File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 3607, in matmul return gen_math_ops.batch_mat_mul_v2( File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1509, in batch_mat_mul_v2 _ops.raise_from_not_ok_status(e, name) File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6941, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: Attempting to perform BLAS operation using StreamExecutor without BLAS support [Op:BatchMatMulV2] """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "tools/create_data.py", line 376, in waymo_data_prep( File "tools/create_data.py", line 238, in waymo_data_prep converter.convert() File "/workspace/pdd-b2-ai/zane.zhang2/quant_work/quantization1/test/mmdetection3d/tools/dataset_converters/waymo_converter.py", line 143, in convert data_infos = mmengine.track_parallel_progress( File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/mmengine/utils/progressbar.py", line 200, in track_parallel_progress for result in gen: File "/root/mambaforge/envs/waymo/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value tensorflow.python.framework.errors_impl.InternalError: Attempting to perform BLAS operation using StreamExecutor without BLAS support [Op:BatchMatMulV2]

Only workers==1 can work

eezhang123 commented 9 months ago

@bigsheep2018 you can download the pkl file, can you show your pip list. I'll refer to your environment , thks

bigsheep2018 commented 9 months ago

@eezhang123 i do not install the packages one by one in a python env. I just use the suggestion here to build a docker :

# build an image with PyTorch 1.9, CUDA 11.1
# If you prefer other versions, just modified the Dockerfile
docker build -t mmdetection3d docker/

Run it with: docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection3d/data mmdetection3d

Then in the docker, manually uninstall numpy 1.19 and install numpy 1.22

pip uninstall numpy
pip install numpy==1.22

eezhang123 commented 9 months ago

@bigsheep2018 Are you dealing with the dataset version 1.4? From its readme, it seems the folders are only these： ├── kitti_format │ │ │ ├── ImageSets │ │ │ ├── training │ │ │ │ ├── image_0 │ │ │ │ ├── image_1 │ │ │ │ ├── image_2 │ │ │ │ ├── image_3 │ │ │ │ ├── image_4 │ │ │ │ ├── velodyne │ │ │ ├── testing │ │ │ │ ├── (the same as training) │ │ │ ├── waymo_gt_database │ │ │ ├── waymo_infos_trainval.pkl │ │ │ ├── waymo_infos_train.pkl │ │ │ ├── waymo_infos_val.pkl │ │ │ ├── waymo_infos_test.pkl │ │ │ ├── waymo_dbinfos_train.pkl

bigsheep2018 commented 9 months ago

/hello @sunjiahao1999 @eezhang123 I build a docker image docker pull bigsheep2012/torch2.1-cuda11.8-cudnn8-devel:mmdet3d_v1.4.0

Environment: Ubuntu 20.04 + Pytorch2.1 + Cuda11.8 + Cudnn8 + numpy 1.26.3 + waymo-open-dataset-tf-2-11-0==1.6.1

Tested:

able to build nus infos and train centerpoint.
able to build waymo(v1.4.2) infos and train pointpillars.

Notice:

As numpy.long is deprecated from numpy 1.24, "mmdete3d/datasets/transforms/dbsampler.py" is modified accordingly. specifically, all instance of "np.long" need to be rewrite as "np.longlong"

mmdet3d-1.4 may need to be reinstalled by

$ pip uninstall mmdet3d
$ cd [your_mmdetection3d_root]
$ pip install -v -e .

Hope this helps.

eezhang123 commented 9 months ago

@sunjiahao1999 @bigsheep2018 I have another question. Does Waymo need to handle training and validation together, or can I only deal with the validation ?

jacoblambert commented 8 months ago

I have also found mmdetection3d 1.4.0 to be completely incompatible with waymo 2.6.0. Even after re-upgrading packages, torch will not import without errors.

Traceback (most recent call last):
  File "tools/create_data.py", line 7, in <module>
    from tools.dataset_converters import indoor_converter as indoor
  File "/home/pe/mmdetection3d/mmdetection3d/tools/dataset_converters/indoor_converter.py", line 10, in <module>
    from tools.dataset_converters.sunrgbd_data_utils import SUNRGBDData
  File "/home/pe/mmdetection3d/mmdetection3d/tools/dataset_converters/sunrgbd_data_utils.py", line 5, in <module>
    import mmcv
  File "/home/pe/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/__init__.py", line 5, in <module>
    from .transforms import *
  File "/home/pe/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/transforms/__init__.py", line 12, in <module>
    import torch  # noqa: F401
  File "/home/pe/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/__init__.py", line 533, in <module>
    for name in dir(_C):
NameError: name '_C' is not defined

I'm simply following the conda build instructions and going from requirements.txt file, so there shouldn't be any major issues... please test and update install instructions.

I have found success installing the latest waymo-tf https://pypi.org/project/waymo-open-dataset-tf-2-11-0/ instead of 2.6.0 (2.6.0 pretty much bricks your environment) and upgrading numpy==1.22.0.

MarvinKlemp commented 8 months ago

I got big problems using the official docker image But using this container worked:

FROM nvcr.io/nvidia/pytorch:23.01-py3

# opencv 4.8 is heavily bugged for many applications
RUN pip install opencv-python==4.7.0.72

# MMDETECTION
RUN mkdir /workspace/mmcv
WORKDIR /workspace/mmcv
RUN git clone https://github.com/open-mmlab/mmcv.git .
RUN pip install -r requirements/optional.txt
ARG TORCH_CUDA_ARCH_LIST="8.6 8.9 9.0+PTX" # <- add your GPU here
ENV FORCE_CUDA="1"
RUN MMCV_WITH_OPS=1 pip install -e . -v
RUN pip install openmim
RUN mim install "mmengine>=0.7.1" "mmdet>=3.0.0"
RUN pip install cumm-cu120 && pip install spconv-cu120 pypcd_imp
RUN apt-get update && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata
RUN apt-get update && apt-get install ffmpeg libx11-6 libsm6 libxext6 -y #TZdata

RUN git clone https://github.com/open-mmlab/mmdetection3d.git /workspace/mmdetection3d \
    && cd /workspace/mmdetection3d \
    && pip install --no-cache-dir -e .

WORKDIR /workspace/mmdetection3d

RUN pip install waymo-open-dataset-tf-2-6-0
RUN pip install opencv-python==3.4.8.29
RUN pip install numpy==1.23.4
RUN pip install numba --upgrade

Btw: this config works for conversion but NOT when evaluating a model on the validation set

open-mmlab / mmdetection3d