Open den250400 opened 2 years ago
Create this directory and you're ready to go:
mkdir ../data_generation/data/
How did you solve the problem of the mpl_test_node package, when I run roslaunch agile_autonomy simulation.launch, I faced the problem: Resource not found: mpl_test_node.
How did you solve the problem of the mpl_test_node package, when I run roslaunch agile_autonomy simulation.launch, I faced the problem: Resource not found: mpl_test_node.
I also have this error message, but nevertheless launch continues and the simulation window opens
Create this directory and you're ready to go:
mkdir ../data_generation/data/
Thanks a lot! After creating the directory, test_trajectory.py launched, copter spawned in the forest and started to avoid trees.
However, the "RGB" camera window in rviz appears to have only 1-2 fps (and so for "rpg_flightmare" window). Is there a way to accelerate this?
There is, unfortunately, nothing you can do about that except get a faster computer. That is just related to the computational budget of your machine.
After 1 month of trials and errors, I was finally able to build agile_autonomy. The only ROS package I was unable to build is
mpl_test_node
, though I think that it's not crucial part of the system.
roslaunch agile_autonomy simulation.launch
launched nice - I was even able to make the copter hover via GUI interface. However, when I tried to fly using the network's predictions (python test_trajectories.py --settings_file=config/test_settings.yaml
), I encountered the following error:2021-11-24` 18:58:17.380102: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.828541: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-11-24 18:58:19.829417: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-11-24 18:58:19.855816: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.856260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5 coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s 2021-11-24 18:58:19.856316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.858212: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:19.858527: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2021-11-24 18:58:19.860495: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-11-24 18:58:19.861046: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-11-24 18:58:19.863336: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-11-24 18:58:19.864947: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2021-11-24 18:58:19.869907: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:19.870182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.870822: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.871087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-11-24 18:58:19.872465: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-11-24 18:58:19.873089: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.873410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5 coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s 2021-11-24 18:58:19.873488: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.873558: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:19.873608: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2021-11-24 18:58:19.873641: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-11-24 18:58:19.873671: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-11-24 18:58:19.873701: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-11-24 18:58:19.873731: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2021-11-24 18:58:19.873761: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:19.873862: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.874198: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.874564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-11-24 18:58:19.874651: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:20.494457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-11-24 18:58:20.494495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-11-24 18:58:20.494503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-11-24 18:58:20.494743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.495361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.495849: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.496326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2922 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5) 2021-11-24 18:58:20.496799: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set ------------------------------------------ Restored from models/ckpt-50 ------------------------------------------ 2021-11-24 18:58:23.119284: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-11-24 18:58:23.183628: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2499950000 Hz 2021-11-24 18:58:23.684142: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:24.938844: W tensorflow/stream_executor/gpu/asm_compiler.cc:63] Running ptxas --version returned 256 2021-11-24 18:58:24.990377: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2021-11-24 18:58:25.396965: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:31.038641: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2021-11-24 18:58:31.619634: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2021-11-24 18:58:31.685442: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.73G (2931228672 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory Net initialized Traceback (most recent call last): File "test_trajectories.py", line 19, in <module> main() File "test_trajectories.py", line 15, in main trainer.perform_testing() File "/home/denis/agile_autonomy_ws/catkin_aa/src/agile_autonomy/planner_learning/dagger_training.py", line 125, in perform_testing removable_rollout_folders = os.listdir(self.settings.expert_folder) FileNotFoundError: [Errno 2] No such file or directory: '../data_generation/data/' --- Logging error --- Traceback (most recent call last): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 69, in emit if self.shouldRollover(record): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 183, in shouldRollover self.stream = self._open() File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/__init__.py", line 1116, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) NameError: name 'open' is not defined Call stack: File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py", line 161, in __del__ .format(pretty_printer.node_names[node_id])) File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 178, in warning get_logger().warning(msg, *args, **kwargs) Message: 'Unresolved object in checkpoint: (root).optimizer.iter' Arguments: () --- Logging error --- Traceback (most recent call last): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 69, in emit if self.shouldRollover(record): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 183, in shouldRollover self.stream = self._open() File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/__init__.py", line 1116, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) NameError: name 'open' is not defined Call stack: File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py", line 161, in __del__ .format(pretty_printer.node_names[node_id])) File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 178, in warning get_logger().warning(msg, *args, **kwargs) Message: 'Unresolved object in checkpoint: (root).optimizer.beta_1' Arguments: ()
I don't know what exactly is the source of the problem, but I have two suggestions:
1. My GPU really doesn't have enough memory (my GPU is Geforce GTX1650 Ti and has 4 Gb of memory) 2. Checkpoint files were saved using older tensorflow version, and the newer one fails to read them. @antonilo @kelia Can you please provide your tensorflow version?
I saw this paper said that we can perform the network on CPU. Did you know how can we perform the network on CPU?
After 1 month of trials and errors, I was finally able to build agile_autonomy. The only ROS package I was unable to build is
mpl_test_node
, though I think that it's not crucial part of the system.
roslaunch agile_autonomy simulation.launch
launched nice - I was even able to make the copter hover via GUI interface. However, when I tried to fly using the network's predictions (python test_trajectories.py --settings_file=config/test_settings.yaml
), I encountered the following error:2021-11-24` 18:58:17.380102: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.828541: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-11-24 18:58:19.829417: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-11-24 18:58:19.855816: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.856260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5 coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s 2021-11-24 18:58:19.856316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.858212: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:19.858527: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2021-11-24 18:58:19.860495: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-11-24 18:58:19.861046: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-11-24 18:58:19.863336: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-11-24 18:58:19.864947: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2021-11-24 18:58:19.869907: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:19.870182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.870822: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.871087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-11-24 18:58:19.872465: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-11-24 18:58:19.873089: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.873410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5 coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s 2021-11-24 18:58:19.873488: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.873558: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:19.873608: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2021-11-24 18:58:19.873641: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-11-24 18:58:19.873671: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-11-24 18:58:19.873701: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-11-24 18:58:19.873731: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2021-11-24 18:58:19.873761: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:19.873862: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.874198: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.874564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-11-24 18:58:19.874651: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:20.494457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-11-24 18:58:20.494495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-11-24 18:58:20.494503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-11-24 18:58:20.494743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.495361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.495849: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.496326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2922 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5) 2021-11-24 18:58:20.496799: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set ------------------------------------------ Restored from models/ckpt-50 ------------------------------------------ 2021-11-24 18:58:23.119284: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-11-24 18:58:23.183628: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2499950000 Hz 2021-11-24 18:58:23.684142: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:24.938844: W tensorflow/stream_executor/gpu/asm_compiler.cc:63] Running ptxas --version returned 256 2021-11-24 18:58:24.990377: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2021-11-24 18:58:25.396965: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:31.038641: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2021-11-24 18:58:31.619634: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2021-11-24 18:58:31.685442: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.73G (2931228672 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory Net initialized Traceback (most recent call last): File "test_trajectories.py", line 19, in <module> main() File "test_trajectories.py", line 15, in main trainer.perform_testing() File "/home/denis/agile_autonomy_ws/catkin_aa/src/agile_autonomy/planner_learning/dagger_training.py", line 125, in perform_testing removable_rollout_folders = os.listdir(self.settings.expert_folder) FileNotFoundError: [Errno 2] No such file or directory: '../data_generation/data/' --- Logging error --- Traceback (most recent call last): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 69, in emit if self.shouldRollover(record): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 183, in shouldRollover self.stream = self._open() File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/__init__.py", line 1116, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) NameError: name 'open' is not defined Call stack: File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py", line 161, in __del__ .format(pretty_printer.node_names[node_id])) File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 178, in warning get_logger().warning(msg, *args, **kwargs) Message: 'Unresolved object in checkpoint: (root).optimizer.iter' Arguments: () --- Logging error --- Traceback (most recent call last): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 69, in emit if self.shouldRollover(record): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 183, in shouldRollover self.stream = self._open() File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/__init__.py", line 1116, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) NameError: name 'open' is not defined Call stack: File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py", line 161, in __del__ .format(pretty_printer.node_names[node_id])) File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 178, in warning get_logger().warning(msg, *args, **kwargs) Message: 'Unresolved object in checkpoint: (root).optimizer.beta_1' Arguments: ()
I don't know what exactly is the source of the problem, but I have two suggestions:
1. My GPU really doesn't have enough memory (my GPU is Geforce GTX1650 Ti and has 4 Gb of memory) 2. Checkpoint files were saved using older tensorflow version, and the newer one fails to read them. @antonilo @kelia Can you please provide your tensorflow version?
I initialized the network very slowly. Did you meet the problem?
I initialized the network very slowly. Did you meet the problem?
On my machine, neural net was also initializing around 40 secs, which is quite slow, but still enough for experiments.
After 1 month of trials and errors, I was finally able to build agile_autonomy. The only ROS package I was unable to build is
mpl_test_node
, though I think that it's not crucial part of the system.
roslaunch agile_autonomy simulation.launch
launched nice - I was even able to make the copter hover via GUI interface. However, when I tried to fly using the network's predictions (python test_trajectories.py --settings_file=config/test_settings.yaml
), I encountered the following error:2021-11-24` 18:58:17.380102: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.828541: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-11-24 18:58:19.829417: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-11-24 18:58:19.855816: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.856260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5 coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s 2021-11-24 18:58:19.856316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.858212: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:19.858527: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2021-11-24 18:58:19.860495: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-11-24 18:58:19.861046: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-11-24 18:58:19.863336: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-11-24 18:58:19.864947: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2021-11-24 18:58:19.869907: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:19.870182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.870822: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.871087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-11-24 18:58:19.872465: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-11-24 18:58:19.873089: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.873410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5 coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s 2021-11-24 18:58:19.873488: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:19.873558: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:19.873608: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2021-11-24 18:58:19.873641: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-11-24 18:58:19.873671: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-11-24 18:58:19.873701: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-11-24 18:58:19.873731: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2021-11-24 18:58:19.873761: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:19.873862: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.874198: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:19.874564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-11-24 18:58:19.874651: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2021-11-24 18:58:20.494457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-11-24 18:58:20.494495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-11-24 18:58:20.494503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-11-24 18:58:20.494743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.495361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.495849: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-11-24 18:58:20.496326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2922 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5) 2021-11-24 18:58:20.496799: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set ------------------------------------------ Restored from models/ckpt-50 ------------------------------------------ 2021-11-24 18:58:23.119284: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-11-24 18:58:23.183628: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2499950000 Hz 2021-11-24 18:58:23.684142: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2021-11-24 18:58:24.938844: W tensorflow/stream_executor/gpu/asm_compiler.cc:63] Running ptxas --version returned 256 2021-11-24 18:58:24.990377: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2021-11-24 18:58:25.396965: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2021-11-24 18:58:31.038641: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2021-11-24 18:58:31.619634: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2021-11-24 18:58:31.685442: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.73G (2931228672 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory Net initialized Traceback (most recent call last): File "test_trajectories.py", line 19, in <module> main() File "test_trajectories.py", line 15, in main trainer.perform_testing() File "/home/denis/agile_autonomy_ws/catkin_aa/src/agile_autonomy/planner_learning/dagger_training.py", line 125, in perform_testing removable_rollout_folders = os.listdir(self.settings.expert_folder) FileNotFoundError: [Errno 2] No such file or directory: '../data_generation/data/' --- Logging error --- Traceback (most recent call last): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 69, in emit if self.shouldRollover(record): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 183, in shouldRollover self.stream = self._open() File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/__init__.py", line 1116, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) NameError: name 'open' is not defined Call stack: File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py", line 161, in __del__ .format(pretty_printer.node_names[node_id])) File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 178, in warning get_logger().warning(msg, *args, **kwargs) Message: 'Unresolved object in checkpoint: (root).optimizer.iter' Arguments: () --- Logging error --- Traceback (most recent call last): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 69, in emit if self.shouldRollover(record): File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/handlers.py", line 183, in shouldRollover self.stream = self._open() File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/logging/__init__.py", line 1116, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) NameError: name 'open' is not defined Call stack: File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py", line 161, in __del__ .format(pretty_printer.node_names[node_id])) File "/home/denis/anaconda3/envs/tf_24/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 178, in warning get_logger().warning(msg, *args, **kwargs) Message: 'Unresolved object in checkpoint: (root).optimizer.beta_1' Arguments: ()
I don't know what exactly is the source of the problem, but I have two suggestions:
- My GPU really doesn't have enough memory (my GPU is Geforce GTX1650 Ti and has 4 Gb of memory)
- Checkpoint files were saved using older tensorflow version, and the newer one fails to read them. @antonilo @kelia Can you please provide your tensorflow version?
Same is the case for me. Still, the simulation ceases to work in my case.
After 1 month of trials and errors, I was finally able to build agile_autonomy. The only ROS package I was unable to build is
mpl_test_node
, though I think that it's not crucial part of the system.roslaunch agile_autonomy simulation.launch
launched nice - I was even able to make the copter hover via GUI interface. However, when I tried to fly using the network's predictions (python test_trajectories.py --settings_file=config/test_settings.yaml
), I encountered the following error:I don't know what exactly is the source of the problem, but I have two suggestions: 1) My GPU really doesn't have enough memory (my GPU is Geforce GTX1650 Ti and has 4 Gb of memory) 2) Checkpoint files were saved using older tensorflow version, and the newer one fails to read them. @antonilo @kelia Can you please provide your tensorflow version?