Open FabianSchuetze opened 2 years ago
What is the egl_renderer actually needed for? I think its just imported in
pysixd/scripts/eval_calc_errors.py
meshrenderer/meshrenderer_texture_color_tensor.py
I guess I could calculate the errors somewhere else and the meshrenderer calcualtes the NOCs, I guess? Do the NOCs need to be available on disk, as they were in GDR-Net, so that I could pre-calculate them on a different machine and the use them on a different server? Could I also use the meshrenderer_texture_color.py
for example?
Maybe your cuda driver is not the opengl cuda driver. Reinstalling the driver from https://developer.nvidia.com/opengl-driver might solve your problem.
Egl renderer is needed for training with XYZ_ONLINE, where the object coordinates are rendered online.
Training with pre-generated coordinates is also possible by disabling the option, but it might not be well supported since we have stopped maintaining this feature for a long time. Saving the coordinates on the disk also needs a huge amount of storage space.
Thanks for the recommendations so far.
I had no luck working with the egl renderer so far though. I will try with a dedicate docker nvidia opengl environment soon.
However, I had some luck working with vispy renderer. Would it be possible to use this to generate the Nocs for me? If so, what (roughly) would i have to change to use this?
Hello!I also encountered a runtime error "RuntimeError: Bindless Textures not supported" while running the gdrnpp model code.
raise RuntimeError("Bindless Textures not supported")
The following is the entire log of the program:
----------------------------------------------------------------------------------------------------
20230725_073142|core.utils.default_args_setup@123: Rank of current process: 0. World size: 2
20230725_073143|core.utils.default_args_setup@124: Environment info:
------------------------------- --------------------------------------------------------------------------
sys.platform linux
Python 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0]
numpy 1.24.4
detectron2 0.6 @/home/appuser/detectron2_repo/detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.7
detectron2 arch flags 3.5, 3.7, 5.0, 5.2, 5.3, 6.0, 6.1, 7.0, 7.5
DETECTRON2_ENV_MODULE <not set>
PyTorch 2.0.1+cu117 @/home/appuser/.local/lib/python3.8/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0,1 NVIDIA GeForce RTX 3090 (arch=8.6)
Driver version 535.86.05
CUDA_HOME /usr/local/cuda
TORCH_CUDA_ARCH_LIST Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing
Pillow 9.0.0.post1
torchvision 0.15.2+cu117 @/home/appuser/.local/lib/python3.8/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.8.0
------------------------------- --------------------------------------------------------------------------
PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: NO AVX
- CUDA Runtime 11.7
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.5
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
20230725_073143|core.utils.default_args_setup@126: Command line arguments: Namespace(config_file='./configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py', dist_url='tcp://127.0.0.1:50152', eval_only=False, fp16_allreduce=False, launcher='none', local_rank=0, machine_rank=0, num_gpus=2, num_machines=1, opts=None, resume=False, strategy='ddp', use_adasum=False)
20230725_073143|core.utils.default_args_setup@128: Contents of args.config_file=./configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py:
_base_ = ["../../_base_/gdrn_base.py"]
OUTPUT_DIR = "output/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo"
INPUT = dict(
DZI_PAD_SCALE=1.5,
TRUNCATE_FG=False,
CHANGE_BG_PROB=0.5,
COLOR_AUG_PROB=0.8,
COLOR_AUG_TYPE="code",
COLOR_AUG_CODE=(
"Sequential(["
# Sometimes(0.5, PerspectiveTransform(0.05)),
# Sometimes(0.5, CropAndPad(percent=(-0.05, 0.1))),
# Sometimes(0.5, Affine(scale=(1.0, 1.2))),
"Sometimes(0.5, CoarseDropout( p=0.2, size_percent=0.05) ),"
"Sometimes(0.4, GaussianBlur((0., 3.))),"
"Sometimes(0.3, pillike.EnhanceSharpness(factor=(0., 50.))),"
"Sometimes(0.3, pillike.EnhanceContrast(factor=(0.2, 50.))),"
"Sometimes(0.5, pillike.EnhanceBrightness(factor=(0.1, 6.))),"
"Sometimes(0.3, pillike.EnhanceColor(factor=(0., 20.))),"
"Sometimes(0.5, Add((-25, 25), per_channel=0.3)),"
"Sometimes(0.3, Invert(0.2, per_channel=True)),"
"Sometimes(0.5, Multiply((0.6, 1.4), per_channel=0.5)),"
"Sometimes(0.5, Multiply((0.6, 1.4))),"
"Sometimes(0.1, AdditiveGaussianNoise(scale=10, per_channel=True)),"
"Sometimes(0.5, iaa.contrast.LinearContrast((0.5, 2.2), per_channel=0.3)),"
"Sometimes(0.5, Grayscale(alpha=(0.0, 1.0)))," # maybe remove for det
"], random_order=True)"
# cosy+aae
),
)
SOLVER = dict(
IMS_PER_BATCH=48,
TOTAL_EPOCHS=40, # 30
LR_SCHEDULER_NAME="flat_and_anneal",
ANNEAL_METHOD="cosine", # "cosine"
ANNEAL_POINT=0.72,
OPTIMIZER_CFG=dict(_delete_=True, type="Ranger", lr=8e-4, weight_decay=0.01),
WEIGHT_DECAY=0.0,
WARMUP_FACTOR=0.001,
WARMUP_ITERS=1000,
)
DATASETS = dict(
TRAIN=("lmo_pbr_train",),
TEST=("lmo_bop_test",),
DET_FILES_TEST=("datasets/BOP_DATASETS/lmo/test/test_bboxes/yolox_x_640_lmo_pbr_lmo_bop_test.json",),
)
DATALOADER = dict(
# Number of data loading threads
NUM_WORKERS=2,
FILTER_VISIB_THR=0.3,
)
MODEL = dict(
LOAD_DETS_TEST=True,
PIXEL_MEAN=[0.0, 0.0, 0.0],
PIXEL_STD=[255.0, 255.0, 255.0],
BBOX_TYPE="AMODAL_CLIP", # VISIB or AMODAL
POSE_NET=dict(
NAME="GDRN_double_mask",
XYZ_ONLINE=True,
NUM_CLASSES=8,
BACKBONE=dict(
FREEZE=False,
PRETRAINED="timm",
INIT_CFG=dict(
type="timm/convnext_base",
pretrained=True,
in_chans=3,
features_only=True,
out_indices=(3,),
),
),
## geo head: Mask, XYZ, Region
GEO_HEAD=dict(
FREEZE=False,
INIT_CFG=dict(
type="TopDownDoubleMaskXyzRegionHead",
in_dim=1024, # this is num out channels of backbone conv feature
),
NUM_REGIONS=64,
XYZ_CLASS_AWARE=True,
MASK_CLASS_AWARE=True,
REGION_CLASS_AWARE=True,
),
PNP_NET=dict(
INIT_CFG=dict(norm="GN", act="gelu"),
REGION_ATTENTION=True,
WITH_2D_COORD=True,
ROT_TYPE="allo_rot6d",
TRANS_TYPE="centroid_z",
),
LOSS_CFG=dict(
# xyz loss ----------------------------
XYZ_LOSS_TYPE="L1", # L1 | CE_coor
XYZ_LOSS_MASK_GT="visib", # trunc | visib | obj
XYZ_LW=1.0,
# mask loss ---------------------------
MASK_LOSS_TYPE="L1", # L1 | BCE | CE
MASK_LOSS_GT="trunc", # trunc | visib | gt
MASK_LW=1.0,
# full mask loss ---------------------------
FULL_MASK_LOSS_TYPE="L1", # L1 | BCE | CE
FULL_MASK_LW=1.0,
# region loss -------------------------
REGION_LOSS_TYPE="CE", # CE
REGION_LOSS_MASK_GT="visib", # trunc | visib | obj
REGION_LW=1.0,
# pm loss --------------
PM_LOSS_SYM=True, # NOTE: sym loss
PM_R_ONLY=True, # only do R loss in PM
PM_LW=1.0,
# centroid loss -------
CENTROID_LOSS_TYPE="L1",
CENTROID_LW=1.0,
# z loss -----------
Z_LOSS_TYPE="L1",
Z_LW=1.0,
),
),
)
VAL = dict(
DATASET_NAME="lmo",
SCRIPT_PATH="lib/pysixd/scripts/eval_pose_results_more.py",
TARGETS_FILENAME="test_targets_bop19.json",
ERROR_TYPES="mspd,mssd,vsd,ad,reS,teS",
RENDERER_TYPE="cpp", # cpp, python, egl
SPLIT="test",
SPLIT_TYPE="",
N_TOP=1, # SISO: 1, VIVO: -1 (for LINEMOD, 1/-1 are the same)
EVAL_CACHED=False, # if the predicted poses have been saved
SCORE_ONLY=False, # if the errors have been calculated
EVAL_PRINT_ONLY=False, # if the scores/recalls have been saved
EVAL_PRECISION=False, # use precision or recall
USE_BOP=True, # whether to use bop toolkit
)
TEST = dict(EVAL_PERIOD=0, VIS=False, TEST_BBOX_TYPE="est") # gt | est`
And details of the error:
20230725_073144|core.utils.default_args_setup@144: Full config saved to output/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py
Global seed set to 44706856
20230725_073144|d2.utils.env@41: Using a generated random seed 44706856
20230725_073144|core.utils.default_args_setup@162: Used mmcv backend: cv2
20230725_073144|DBG|OpenGL.platform.ctypesloader@65: Loaded libEGL.so => libEGL.so <CDLL 'libEGL.so', handle a2a9b10 at 0x7f66aeeb8b80>
20230725_073145|ERR|__main__@233: An error has been caught in function '<module>', process 'MainProcess' (36), thread 'MainThread' (140082418370368):
Traceback (most recent call last):
> File "./core/gdrn_modeling/main_gdrn.py", line 233, in <module>
main(args)
│ └ Namespace(config_file='./configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py', di...
└ <function main at 0x7f66af0c8040>
File "./core/gdrn_modeling/main_gdrn.py", line 199, in main
Lite(
└ <class '__main__.Lite'>
File "/home/appuser/.local/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 406, in _run_impl
return self._strategy.launcher.launch(run_method, *args, **kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ (Namespace(config_file='./configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py', d...
│ │ │ └ functools.partial(<bound method LightningLite._run_with_strategy_setup of <__main__.Lite object at 0x7f6649eb3c10>>, <bound m...
│ │ └ <property object at 0x7f66e8411360>
│ └ <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f66af029730>
└ <__main__.Lite object at 0x7f6649eb3c10>
File "/home/appuser/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
│ │ └ {}
│ └ (Namespace(config_file='./configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py', d...
└ functools.partial(<bound method LightningLite._run_with_strategy_setup of <__main__.Lite object at 0x7f6649eb3c10>>, <bound m...
File "/home/appuser/.local/lib/python3.8/site-packages/pytorch_lightning/lite/lite.py", line 413, in _run_with_strategy_setup
return run_method(*args, **kwargs)
│ │ └ {}
│ └ (Namespace(config_file='./configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py', d...
└ <bound method Lite.run of <__main__.Lite object at 0x7f6649eb3c10>>
File "./core/gdrn_modeling/main_gdrn.py", line 155, in run
renderer = get_renderer(cfg, data_ref, obj_names=train_obj_names, gpu_id=render_gpu_id)
│ │ │ │ └ 0
│ │ │ └ ['ape', 'can', 'cat', 'driller', 'duck', 'eggbox', 'glue', 'holepuncher']
│ │ └ <module 'ref.lmo_full' from '/home/appuser/gdrnpp-docker/core/gdrn_modeling/../../ref/lmo_full.py'>
│ └ Config (path: ./configs/gdrn/lmo_pbr/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_lmo.py): {'OUTPUT_RO...
└ <function get_renderer at 0x7f664a14c5e0>
File "/home/appuser/gdrnpp-docker/core/gdrn_modeling/../../core/gdrn_modeling/engine/engine_utils.py", line 280, in get_renderer
ren = EGLRenderer(
└ <class 'lib.egl_renderer.egl_renderer_v3.EGLRenderer'>
File "/home/appuser/gdrnpp-docker/core/gdrn_modeling/../../lib/egl_renderer/egl_renderer_v3.py", line 81, in __init__
self._context = OffscreenContext(gpu_id=cuda_device_idx)
│ │ └ 0
│ └ <class 'lib.egl_renderer.glutils.egl_offscreen_context.OffscreenContext'>
└ <lib.egl_renderer.egl_renderer_v3.EGLRenderer object at 0x7f66aeeb8fa0>
File "/home/appuser/gdrnpp-docker/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 157, in __init__
self.init_context()
│ └ <function OffscreenContext.init_context at 0x7f664a137a60>
└ <lib.egl_renderer.glutils.egl_offscreen_context.OffscreenContext object at 0x7f66aeeb8f40>
File "/home/appuser/gdrnpp-docker/core/gdrn_modeling/../../lib/egl_renderer/glutils/egl_offscreen_context.py", line 226, in init_context
raise RuntimeError("Bindless Textures not supported")
RuntimeError: Bindless Textures not supported
20230725_073145|DBG|filelock._api@212: Attempting to acquire lock 140077390192112 on /home/appuser/.triton/autotune/Fp16Matmul_2d_kernel.pickle.lock
20230725_073145|DBG|filelock._api@215: Lock 140077390192112 acquired on /home/appuser/.triton/autotune/Fp16Matmul_2d_kernel.pickle.lock
20230725_073145|DBG|filelock._api@244: Attempting to release lock 140077390192112 on /home/appuser/.triton/autotune/Fp16Matmul_2d_kernel.pickle.lock
20230725_073145|DBG|filelock._api@247: Lock 140077390192112 released on /home/appuser/.triton/autotune/Fp16Matmul_2d_kernel.pickle.lock
20230725_073145|DBG|filelock._api@212: Attempting to acquire lock 140077390192112 on /home/appuser/.triton/autotune/Fp16Matmul_4d_kernel.pickle.lock
20230725_073145|DBG|filelock._api@215: Lock 140077390192112 acquired on /home/appuser/.triton/autotune/Fp16Matmul_4d_kernel.pickle.lock
20230725_073145|DBG|filelock._api@244: Attempting to release lock 140077390192112 on /home/appuser/.triton/autotune/Fp16Matmul_4d_kernel.pickle.lock
20230725_073145|DBG|filelock._api@247: Lock 140077390192112 released on /home/appuser/.triton/autotune/Fp16Matmul_4d_kernel.pickle.lock`
I tried the solution mentioned by you to update the NVIDIA driver to version 535.86.05, but the problem was not resolved. I can't find detailed information and system requirements about the Bindless Texture feature. If you have any experience with it, could you please share the relevant details?
Or, if possible, could you provide the system configuration information that allowed you to successfully run gdrnpp?(Such as the system version, graphics card driver, OpenGL version, etc.)
Looking forward to your response. Thank you!
Please first make sure your environment can successfully run other egl programs such as https://github.com/vispy/vispy/blob/main/examples/offscreen/simple_egl.py , https://github.com/DLR-RM/AugmentedAutoencoder#headless-rendering , https://github.com/thodan/bop_toolkit#vispy-renderer-default .
Thank you very much for your help! I followed your advice and ran the programs https://github.com/vispy/vispy/blob/main/examples/offscreen/simple_egl.py and https://github.com/thodan/bop_toolkit#vispy-renderer-default, both of which ran successfully. It seems that the EGL renderer in the environment is working fine. However, the issue with Bindless Texture still persists. Do you have any other suggestions to resolve this problem? Any information would be greatly appreciated. Thank you!
Hello!I wanted to inform you that the issue mentioned earlier has been resolved. However, I cannot pinpoint the exact step that proved crucial, because I took multiple measures to address the problem.Firstly, I updated the NVIDIA driver to version 535.86.05. Next, I upgraded OpenGL to version 3.1. During the reinstallation of OpenGL, I also made modifications to the versions of some related resource libraries. Fortunately, after following these steps, I haven't encountered the previous error anymore. I am delighted by this outcome and appreciate your assistance and guidance in resolving the issue.
Thank you once again for your patient support.
I am getting the same error in a conda environment in a remote gpu server. However the exact same configuration of conda environment works fine in my local computer system. Both the driver, cuda and other dependency versions are same. I was able to run https://github.com/vispy/vispy/blob/main/examples/offscreen/simple_egl.py as suggested. But no luck in resolving this issue.
Are you using the opengl cuda driver?
Thanks for continuing the work on GDR. I enjoyed working with the initial version a year ago and look forward to trying the revised version.
When I start the training, I receive the following error message from
egl_offscreen_context.py:227
:The entire log of the program is:
I could install all the required pacakges in
scripts/install_deps.sh
and also compiled the egl_renderer viash ./lib/egl_renderer/compile_cpp_egl_renderer.sh
. After the compilation of the egl_renderer, the query devices identifies the gpu:However, the example program
egl_renderer_v3
fails with the same errors as the training program:The program in run in an google-colab type environment.
Do you have any idea how to fix this issue?