open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.25k stars 1.54k forks source link

[Bug] Waymo dataset conversion #2858

Open eezhang123 opened 9 months ago

eezhang123 commented 9 months ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

1.x branch https://github.com/open-mmlab/mmdetection3d/tree/dev-1.x

Environment

sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.3, V11.3.109 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.12.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.13.0 OpenCV: 4.9.0 MMEngine: 0.10.2 MMDetection: 3.2.0 MMDetection3D: 1.3.0+8deeb6e spconv2.0: False

Reproduces the problem - code sample

pip install -r requirements/optional.txt

Reproduces the problem - command or script

pip install -r requirements/optional.txt

Reproduces the problem - error message

Installing collected packages: wrapt, typing-extensions, termcolor, flatbuffers, clang, appdirs, typed-ast, toml, tensorflow-estimator, six, regex, numpy, keras, gast, opt-einsum, keras-preprocessing, h5py, google-pasta, black, astunparse, absl-py, scikit-image, tensorflow, waymo-open-dataset-tf-2-6-0 Attempting uninstall: typing-extensions Found existing installation: typing_extensions 4.9.0 Uninstalling typing_extensions-4.9.0: Successfully uninstalled typing_extensions-4.9.0 Attempting uninstall: termcolor Found existing installation: termcolor 2.4.0 Uninstalling termcolor-2.4.0: Successfully uninstalled termcolor-2.4.0 Attempting uninstall: six Found existing installation: six 1.16.0 Uninstalling six-1.16.0: Successfully uninstalled six-1.16.0 Attempting uninstall: numpy Found existing installation: numpy 1.24.3 Uninstalling numpy-1.24.3: Successfully uninstalled numpy-1.24.3 Attempting uninstall: black Found existing installation: black 23.12.1 Uninstalling black-23.12.1: Successfully uninstalled black-23.12.1 Attempting uninstall: absl-py Found existing installation: absl-py 2.0.0 Uninstalling absl-py-2.0.0: Successfully uninstalled absl-py-2.0.0 Attempting uninstall: scikit-image Found existing installation: scikit-image 0.21.0 Uninstalling scikit-image-0.21.0: Successfully uninstalled scikit-image-0.21.0 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. dash 2.14.2 requires typing-extensions>=4.1.1, but you have typing-extensions 3.7.4.3 which is incompatible. numba 0.58.1 requires numpy<1.27,>=1.22, but you have numpy 1.19.5 which is incompatible. nuscenes-devkit 1.1.11 requires numpy>=1.22.0, but you have numpy 1.19.5 which is incompatible. pandas 2.0.3 requires numpy>=1.20.3; python_version < "3.10", but you have numpy 1.19.5 which is incompatible. rich 13.4.2 requires typing-extensions<5.0,>=4.0.0; python_version < "3.9", but you have typing-extensions 3.7.4.3 which is incompatible. Successfully installed absl-py-0.15.0 appdirs-1.4.4 astunparse-1.6.3 black-20.8b1 clang-5.0 flatbuffers-1.12 gast-0.4.0 google-pasta-0.2.0 h5py-3.1.0 keras-2.15.0 keras-preprocessing-1.1.2 numpy-1.19.5 opt-einsum-3.3.0 regex-2023.12.25 scikit-image-0.19.3 six-1.15.0 tensorflow-2.6.0 tensorflow-estimator-2.15.0 termcolor-1.1.0 toml-0.10.2 typed-ast-1.5.5 typing-extensions-3.7.4.3 waymo-open-dataset-tf-2-6-0-1.4.9 wrapt-1.12.1

when i run the ''python mmdet3d/utils/collect_env.py''

ImportError: Numba needs NumPy 1.22 or greater. Got NumPy 1.19.

when i install the numpy >=1.22 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. [6/1981] tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.24.4 which is incompatible.

Additional information

I follow the https://mmdetection3d.readthedocs.io/en/latest/get_started.html step by step, but when i run the ‘’pip install -r requirements/optional.txt‘’, waymo-open-dataset-tf-2-6-0-1.4.9 uninstall the numpy 1.24 and install the numpy1.19, also other package has the same problem. The environment goes wrong

I want to convert the waymo dataset to the kitti format, thks for your reply.

bigsheep2018 commented 9 months ago

@eezhang123 I manually upgrade the numpy to 1.22, and encounter the issue #1233 MMLab Team, Please update the waymo dataset preparation docs/docker for setup mmdetection3d development environments. @Xiangxu-0103

eezhang123 commented 9 months ago

@eezhang123 I manually upgrade the numpy to 1.22, and encounter the issue #1233 MMLab Team, Please update the waymo dataset preparation docs/docker for setup mmdetection3d development environments. @Xiangxu-0103

It seems waymo-open-dataset-tf-2-6-0 doesn't match mmdet3d enviroment. i agree with you, MMLab Team need update docs or docker for setup mmdet3d enviroment .

sunjiahao1999 commented 9 months ago

@eezhang123 waymo-open-dataset-tf-2-6-0 is less compatible, so we put it in option.txt you can reinstall numpy==1.23.0 after installing waymo-open-dataset-tf-2-6-0.

sunjiahao1999 commented 9 months ago

@bigsheep2018 Is the training and validation infos generated properly? And also make sure you have the test dataset under your data/waymo/waymo_format.

eezhang123 commented 9 months ago

@eezhang123 waymo-open-dataset-tf-2-6-0 is less compatible, so we put it in option.txt you can reinstall numpy==1.23.0 after installing waymo-open-dataset-tf-2-6-0.

I try you method again update the numpy to 1.23.0, but the waymo-open-dataset-tf-2-6-0 only support numpy==1.19.2

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.23.0 which is incompatible.

sunjiahao1999 commented 9 months ago

Yes, requirements/optional.txt. In my case, I reinstall numpy==1.23.0 and mmdet3d can work properly. But, it doesn't necessarily apply to everyone. You can omit this warning tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.23.0 which is incompatible.. It will not block you from generating infos files and ground truth database.

bigsheep2018 commented 9 months ago

hello @eezhang123 @sunjiahao1999 , There is no info(*.pkl) generated as follows image

The program get stuck after 2~3 hrs like follows: image

There is error repeatedly displayed at beginning: failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error I am not sure if it maters as the program is still running like follows

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-04 03:45:09.784422: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-01-04 03:45:09.784602: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 6a95689e6a5f
2024-01-04 03:45:09.784622: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 6a95689e6a5f
2024-01-04 03:45:09.784961: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 470.57.2
2024-01-04 03:45:09.785034: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.57.2
2024-01-04 03:45:09.785048: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 470.57.2
2024-01-04 03:45:09.786640: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-04 03:45:09.793101: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-01-04 03:45:09.793208: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 6a95689e6a5f
2024-01-04 03:45:09.793288: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 6a95689e6a5f
2024-01-04 03:45:09.793562: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 470.57.2
2024-01-04 03:45:09.793618: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.57.2
2024-01-04 03:45:09.793631: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 470.57.2
2024-01-04 03:45:09.795724: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA

The waymo raw data is organized as follows: (each folder contains several extracted .tfrecords and .rar raw files downloaded from waymo official site) image

eezhang123 commented 9 months ago

Yes, requirements/optional.txt. In my case, I reinstall numpy==1.23.0 and mmdet3d can work properly. But, it doesn't necessarily apply to everyone. You can omit this warning tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.23.0 which is incompatible.. It will not block you from generating infos files and ground truth database.

yes, your suggestion is right. But I meet a new error: python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 8 --extra-tag waymo

when the workers ==8, the error message as follows:

2024-01-05 03:09:16.194942: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2024-01-05 03:09:16.243395: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2024-01-05 03:09:16.243476: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at matmul_op_impl.h:451 : Internal: Attempting to perform BLAS operation using Str eamExecutor without BLAS support 2024-01-05 03:09:16.245937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6 MB memory: -> device: 0 , name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:0e:00.0, compute capability: 8.0 2024-01-05 03:09:16.249905: E tensorflow/stream_executor/cuda/cuda_driver.cc:692] could not allocate CUDA stream for context 0x98adee0: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2024-01-05 03:09:16.249931: I tensorflow/stream_executor/stream_executor_pimpl.cc:780] failed to allocate stream; live stream count: 3 2024-01-05 03:09:16.249941: E tensorflow/stream_executor/stream.cc:310] failed to allocate stream during initialization

when the workers == 2, the error message as follows:

2024-01-05 03:16:12.980419: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2024-01-05 03:16:13.203962: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2024-01-05 03:16:13.679389: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2024-01-05 03:16:13.679493: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at matmul_op_impl.h:451 : Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support 2024-01-05 03:16:13.685689: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2024-01-05 03:16:13.685784: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at matmul_op_impl.h:451 : Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support

File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/waymo_open_dataset/utils/transform_utils.py", line 126, in get_rotation_matrix return tf.matmul(r_yaw, tf.matmul(r_pitch, r_roll)) File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper return target(*args, **kwargs) File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 3607, in matmul return gen_math_ops.batch_mat_mul_v2( File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1509, in batch_mat_mul_v2 _ops.raise_from_not_ok_status(e, name) File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6941, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: Attempting to perform BLAS operation using StreamExecutor without BLAS support [Op:BatchMatMulV2] """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "tools/create_data.py", line 376, in waymo_data_prep( File "tools/create_data.py", line 238, in waymo_data_prep converter.convert() File "/workspace/pdd-b2-ai/zane.zhang2/quant_work/quantization1/test/mmdetection3d/tools/dataset_converters/waymo_converter.py", line 143, in convert data_infos = mmengine.track_parallel_progress( File "/root/mambaforge/envs/waymo/lib/python3.8/site-packages/mmengine/utils/progressbar.py", line 200, in track_parallel_progress for result in gen: File "/root/mambaforge/envs/waymo/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value tensorflow.python.framework.errors_impl.InternalError: Attempting to perform BLAS operation using StreamExecutor without BLAS support [Op:BatchMatMulV2]

Only workers==1 can work

eezhang123 commented 9 months ago

@bigsheep2018 you can download the pkl file, can you show your pip list. I'll refer to your environment , thks

bigsheep2018 commented 9 months ago

@eezhang123 i do not install the packages one by one in a python env. I just use the suggestion here to build a docker :

# build an image with PyTorch 1.9, CUDA 11.1
# If you prefer other versions, just modified the Dockerfile
docker build -t mmdetection3d docker/

Run it with: docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection3d/data mmdetection3d

Then in the docker, manually uninstall numpy 1.19 and install numpy 1.22

pip uninstall numpy
pip install numpy==1.22
eezhang123 commented 9 months ago

@bigsheep2018 Are you dealing with the dataset version 1.4? From its readme, it seems the folders are only these: ├── kitti_format │ │ │ ├── ImageSets │ │ │ ├── training │ │ │ │ ├── image_0 │ │ │ │ ├── image_1 │ │ │ │ ├── image_2 │ │ │ │ ├── image_3 │ │ │ │ ├── image_4 │ │ │ │ ├── velodyne │ │ │ ├── testing │ │ │ │ ├── (the same as training) │ │ │ ├── waymo_gt_database │ │ │ ├── waymo_infos_trainval.pkl │ │ │ ├── waymo_infos_train.pkl │ │ │ ├── waymo_infos_val.pkl │ │ │ ├── waymo_infos_test.pkl │ │ │ ├── waymo_dbinfos_train.pkl

bigsheep2018 commented 9 months ago

/hello @sunjiahao1999 @eezhang123 I build a docker image docker pull bigsheep2012/torch2.1-cuda11.8-cudnn8-devel:mmdet3d_v1.4.0

Environment: Ubuntu 20.04 + Pytorch2.1 + Cuda11.8 + Cudnn8 + numpy 1.26.3 + waymo-open-dataset-tf-2-11-0==1.6.1

Tested:

  1. able to build nus infos and train centerpoint.
  2. able to build waymo(v1.4.2) infos and train pointpillars.

Notice:

  1. As numpy.long is deprecated from numpy 1.24, "mmdete3d/datasets/transforms/dbsampler.py" is modified accordingly. specifically, all instance of "np.long" need to be rewrite as "np.longlong"
  2. mmdet3d-1.4 may need to be reinstalled by
    $ pip uninstall mmdet3d
    $ cd [your_mmdetection3d_root]
    $ pip install -v -e .

    Hope this helps.

eezhang123 commented 9 months ago

@sunjiahao1999 @bigsheep2018 I have another question. Does Waymo need to handle training and validation together, or can I only deal with the validation ?

jacoblambert commented 8 months ago

I have also found mmdetection3d 1.4.0 to be completely incompatible with waymo 2.6.0. Even after re-upgrading packages, torch will not import without errors.

Traceback (most recent call last):
  File "tools/create_data.py", line 7, in <module>
    from tools.dataset_converters import indoor_converter as indoor
  File "/home/pe/mmdetection3d/mmdetection3d/tools/dataset_converters/indoor_converter.py", line 10, in <module>
    from tools.dataset_converters.sunrgbd_data_utils import SUNRGBDData
  File "/home/pe/mmdetection3d/mmdetection3d/tools/dataset_converters/sunrgbd_data_utils.py", line 5, in <module>
    import mmcv
  File "/home/pe/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/__init__.py", line 5, in <module>
    from .transforms import *
  File "/home/pe/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/transforms/__init__.py", line 12, in <module>
    import torch  # noqa: F401
  File "/home/pe/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/__init__.py", line 533, in <module>
    for name in dir(_C):
NameError: name '_C' is not defined

I'm simply following the conda build instructions and going from requirements.txt file, so there shouldn't be any major issues... please test and update install instructions.

I have found success installing the latest waymo-tf https://pypi.org/project/waymo-open-dataset-tf-2-11-0/ instead of 2.6.0 (2.6.0 pretty much bricks your environment) and upgrading numpy==1.22.0.

MarvinKlemp commented 8 months ago

I got big problems using the official docker image But using this container worked:

FROM nvcr.io/nvidia/pytorch:23.01-py3

# opencv 4.8 is heavily bugged for many applications
RUN pip install opencv-python==4.7.0.72

# MMDETECTION
RUN mkdir /workspace/mmcv
WORKDIR /workspace/mmcv
RUN git clone https://github.com/open-mmlab/mmcv.git .
RUN pip install -r requirements/optional.txt
ARG TORCH_CUDA_ARCH_LIST="8.6 8.9 9.0+PTX" # <- add your GPU here
ENV FORCE_CUDA="1"
RUN MMCV_WITH_OPS=1 pip install -e . -v
RUN pip install openmim
RUN mim install "mmengine>=0.7.1" "mmdet>=3.0.0"
RUN pip install cumm-cu120 && pip install spconv-cu120 pypcd_imp
RUN apt-get update && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata
RUN apt-get update && apt-get install ffmpeg libx11-6 libsm6 libxext6 -y #TZdata

RUN git clone https://github.com/open-mmlab/mmdetection3d.git /workspace/mmdetection3d \
    && cd /workspace/mmdetection3d \
    && pip install --no-cache-dir -e .

WORKDIR /workspace/mmdetection3d

RUN pip install waymo-open-dataset-tf-2-6-0
RUN pip install opencv-python==3.4.8.29
RUN pip install numpy==1.23.4
RUN pip install numba --upgrade

Btw: this config works for conversion but NOT when evaluating a model on the validation set