kpertsch / rlds_dataset_builder

An example RLDS dataset builder for X-embodiment dataset conversion.
MIT License
80 stars 115 forks source link

slow speed #3

Open zwbx opened 7 months ago

zwbx commented 7 months ago

Hi, thanks for your work. I have generated my own dataset successfully. However, I notice that the processing speed is a little a bit. Cause I am notr a expert for tensorflow, I would like to ask if all the setting goes well. I follow the instruction to enable the Parallelizing Data Processing, but I am not sure wether it works.

(rlds_env) wenbo@wenbo-4090:~/Documents/data/rlds_dataset_builder/RLBench_dataset$ tfds build --overwrite --beam_pipeline_options="direct_running_mode=multi_processing,direct_num_workers=10" INFO[build.py]: Loading dataset from path: /media/wenbo/12T/rlds_dataset_builder/RLBench_dataset/RLBench_dataset_dataset_builder.py 2024-03-23 20:01:38.474182: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-03-23 20:01:38.495278: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-23 20:01:38.753751: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-03-23 20:01:38.853793: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:38.867960: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:38.868039: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:39.080119: W tensorflow/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal". INFO[resolver.py]: Using /tmp/tfhub_modules to cache modules. 2024-03-23 20:01:40.125738: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:40.125853: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:40.125898: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:40.167775: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:40.167863: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:40.167917: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-03-23 20:01:40.167973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21885 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9 INFO[load.py]: Fingerprint not found. Saved model loading will continue. INFO[build.py]: download_and_prepare for dataset rl_bench_dataset/1.0.0... INFO[native_type_compatibility.py]: Using Any for unsupported type: typing.Sequence[~T] INFO[bigquery.py]: No module named google.cloud.bigquery_storage_v1. As a result, the ReadFromBigQuery transform CANNOT be used with method=DIRECT_READ. INFO[dataset_builder.py]: Generating dataset rl_bench_dataset (/home/wenbo/tensorflow_datasets/rl_bench_dataset/1.0.0) Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /home/wenbo/tensorflow_datasets/rl_bench_dataset/1.0.0... Generating splits...: 0%| 2024-03-23 20:01:46.127202: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory 2024-03-23 20:01:46.328955: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:606] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. Generating train examples...: 5 examples [00:32, 6.87s/ examples]

kpertsch commented 7 months ago

Hi, thanks for your interest! I suggest switching to the multi-threaded branch -- you can likely copy over most of your code changes -- the parallelization on the main branch turned out to not work very well, and in the multithreaded branch I manually parallelize the processing which seems to work more reliably!