talmolab / sleap

A deep learning framework for multi-animal pose tracking.
https://sleap.ai
Other
435 stars 96 forks source link

Error during inference #628

Closed owadhwa closed 2 years ago

owadhwa commented 2 years ago

Hi,

I have been trying to use SLEAP to estimate the poses of Drosophila single fly videos. However, I get the following error message after starting to train the model:

Screenshot (937)

The terminal says the following:

Traceback (most recent call last): File "C:\Users\Omika\anaconda3\envs\sleap\Scripts\sleap-track-script.py", line 33, in sys.exit(load_entry_point('sleap==1.1.5', 'console_scripts', 'sleap-track')()) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 3034, in main predictor = _make_predictor_from_cli(args)Args:

File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 2969, in _make_predictor_from_cli progress_reporting=args.verbosity, File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 2676, in load_model batch_size=batch_size, File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 154, in from_model_paths batch_size=batch_size, File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 1170, in from_trained_models confmap_keras_model_path, compile=False File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\saving\save.py", line 186, in load_model loader_impl.parse_saved_model(filepath) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\saved_model\loader_impl.py", line 113, in parse_saved_model constants.SAVED_MODEL_FILENAME_PB)) OSError: SavedModel file does not exist at: D:/CuSO4/Standardising arena size/Medium size arena\models\singleanimal_trial220113_151510.single_instance.n=20\best_model.h5/{saved_model.pbtxt|saved_model.pb} data_path: D:/CuSO4/Standardising arena size/Medium size arena/medium size arena_camera at top_no zoom.mp4 models: ['D:/CuSO4/Standardising arena size/Medium size arena\models\singleanimal_trial220113_151510.single_instance.n=20'] frames: 1227-58541 (20) only_labeled_frames: False only_suggested_frames: False output: D:/CuSO4/Standardising arena size/Medium size arena\predictions\medium size arena_camera at top_no zoom.mp4.220113_154240.predictions.slp Process return code: 1

Could you please help me out resolve this issue?

Also, just for more information, the validation of the model is also quite off: (though I would expect this because the model is not fully trained yet)

Screenshot (938)

Thanks in advance,

Omika Wadhwa

talmo commented 2 years ago

Hi @om-git216,

From the error it looks like the model doesn't even exist (D:/CuSO4/Standardising arena size/Medium size arena\models\singleanimal_trial220113_151510.single_instance.n=20) -- this is probably because the training failed altogether.

We can find some more troubleshooting hints if you happen to still have the command line logs from the training (should be just before the logs you pasted).

Another quick tip: even though this is a single animal, it might be a bit easier to actually use a top-down multi-animal model. The top-down model will first detect the animal and then crop it before estimating the other landmarks. This will be helpful for your case because the fly is a lot smaller than the full size of the image. When your model is ready for tracking, we can then filter it down to a single instance to get rid of any extraneous detections.

owadhwa commented 2 years ago

Hi @talmo,

I did run the training again, this time using a top-down multi-animal model. However, I still get an error while training the model. This is what the Anaconda prompt says:

2022-01-14 15:27:09.189512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/Users/Omika/Documents/SLEAP/Medium Size arena/labels.v000.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 19 / Validation = 2. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2022-01-14 15:27:09.334850: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-01-14 15:27:09.346333: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x209b63f22e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-01-14 15:27:09.346601: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2022-01-14 15:27:09.347695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6 coreClock: 1.702GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s 2022-01-14 15:27:09.349159: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2022-01-14 15:27:09.349607: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2022-01-14 15:27:09.350067: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2022-01-14 15:27:09.350481: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2022-01-14 15:27:09.350916: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2022-01-14 15:27:09.351366: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2022-01-14 15:27:09.351800: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2022-01-14 15:27:09.352467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2022-01-14 15:33:01.839862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-01-14 15:33:01.840074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2022-01-14 15:33:01.841336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2022-01-14 15:33:01.842587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4733 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6) 2022-01-14 15:33:01.848993: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x209c60d0e80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2022-01-14 15:33:01.849166: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3060 Laptop GPU, Compute Capability 8.6 INFO:sleap.nn.training:Loaded test example. [354.110s] INFO:sleap.nn.training: Input shape: (368, 640, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,105 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = CentroidConfmapsHead(anchor_part=None, sigma=5.0, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = Tensor("CentroidConfmapsHead_0/BiasAdd:0", shape=(None, 184, 320, 1), dtype=float32) INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 19 INFO:sleap.nn.training:Validation set: n = 2 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-06, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [357.6s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [2.3s] INFO:sleap.nn.training:Starting training loop... Epoch 1/200 2022-01-14 15:33:10.378154: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2022-01-14 15:50:27.261154: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2022-01-14 15:50:27.347297: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2022-01-14 15:51:42.322187: W tensorflow/core/common_runtime/bfc_allocator.cc:312] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. WARNING:tensorflow:Callbacks method on_train_batch_end is slow compared to the batch time (batch time: 0.0090s vs on_train_batch_end time: 0.0543s). Check your callbacks. C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) 2022-01-14 15:51:58.048224: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-14 15:51:58.139629: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-14 15:51:58.140876: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-14 15:51:58.284899: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-14 15:51:58.856039: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-14 15:51:58.970525: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-14 15:51:59.674594: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-14 15:52:01.051817: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.03GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-14 15:52:01.262828: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.03GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 19s - loss: nan - val_loss: nan Epoch 2/200 Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 14s - loss: nan - val_loss: nan Epoch 3/200 Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 14s - loss: nan - val_loss: nan Epoch 4/200 Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 14s - loss: nan - val_loss: nan Epoch 5/200 Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta)

Epoch 00005: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05. C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 14s - loss: nan - val_loss: nan Epoch 6/200 Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 14s - loss: nan - val_loss: nan Epoch 7/200 Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 14s - loss: nan - val_loss: nan Epoch 8/200 Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): Ignored NaN, Inf, or -Inf value. 200/200 - 14s - loss: nan - val_loss: nan Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png Epoch 9/200 C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 14s - loss: nan - val_loss: nan Epoch 10/200 Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:2347: RuntimeWarning: invalid value encountered in less self.monitor_op = lambda a, b: np.less(a, b - self.min_delta) C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1291: RuntimeWarning: invalid value encountered in less if self.monitor_op(current, self.best): C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\keras\callbacks.py:1664: RuntimeWarning: invalid value encountered in less if self.monitor_op(current - self.min_delta, self.best): 200/200 - 14s - loss: nan - val_loss: nan Epoch 00010: early stopping INFO:sleap.nn.training:Finished training loop. [21.0 min] INFO:sleap.nn.training:Deleting visualization directory: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz INFO:sleap.nn.training:Saving evaluation metrics to model folder... Predicting... ---------------------------------------- 0% ETA: -:--:-- ?Ignored NaN, Inf, or -Inf value. Ignored NaN, Inf, or -Inf value. Polling: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21\viz\validation..png Predicting... ---------------------------------------- 0% ETA: -:--:-- ? Traceback (most recent call last): File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1296, in truediv return _truediv_python3(x, y, name) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1221, in _truediv_python3 x = ops.convert_to_tensor(x, name="x") File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py", line 1499, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\constant_op.py", line 338, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\constant_op.py", line 264, in constant allow_broadcast=True) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\constant_op.py", line 275, in _constant_impl return _constant_eager_impl(ctx, value, dtype, shape, verify_shape) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\constant_op.py", line 300, in _constant_eager_impl t = convert_to_eager_tensor(value, ctx, dtype) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\constant_op.py", line 98, in convert_to_eager_tensor return ops.EagerTensor(value, ctx.device_name, dtype) ValueError: TypeError: object of type 'RaggedTensor' has no len()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Omika\anaconda3\envs\sleap\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.1.5', 'console_scripts', 'sleap-train')()) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1617, in main trainer.train() File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 904, in train self.evaluate() File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 917, in evaluate split_name="train", File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\evals.py", line 699, in evaluate_model labels_pr = predictor.predict(labels_reader, make_labels=True) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 390, in predict self._make_labeled_frames_from_generator(generator, data) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 2011, in _make_labeled_frames_from_generator for ex in generator: File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 300, in _predict_generator ex = process_batch(ex) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py", line 279, in process_batch np.expand_dims(ex["scale"], axis=1), axis=1 File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\util\dispatch.py", line 205, in wrapper result = dispatch(wrapper, args, kwargs) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\util\dispatch.py", line 118, in dispatch result = dispatcher.handle(args, kwargs) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\ragged\ragged_dispatch.py", line 219, in handle ragged_tensor_shape.RaggedTensorDynamicShape.from_tensor(y)) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\ragged\ragged_tensor_shape.py", line 470, in broadcast_dynamic_shape shape_x = shape_x.broadcast_dimension(axis, shape_y.dimension_size(axis)) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\ragged\ragged_tensor_shape.py", line 351, in broadcast_dimension condition, data=broadcast_err, summarize=10) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper return target(*args, *kwargs) File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\util\tf_should_use.py", line 247, in wrapped return _add_should_use_warning(fn(args, **kwargs), File "C:\Users\Omika\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 158, in Assert (condition, "\n".join(data_str))) tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true. Summarized data: b'Unable to broadcast: dimension size mismatch in dimension' 0 b'lengths=' 4 b'dim_size=' 0 INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: C:/Users/Omika/Documents/SLEAP/Medium Size arena\models\poseestimation_trial220114_152641.centroid.n=21

Thank you for your help,

Omika Wadhwa

talmo commented 2 years ago

Hi @om-git216,

Thanks for posting the detailed training logs. It looks like the training is failing entirely right from the start -- which is highly unusual!

Would you mind sharing your training package (PredictExport Labels Package....Labeled frames)? This will package the images with the SLEAP project file. It might be large, so you can stick it on Google Drive/Dropbox/etc and share the link if necessary.

If you prefer, you can share it privately by emailing it to talmo@salk.edu.

We'll figure out what's going on here! :)

Cheers,

Talmo

owadhwa commented 2 years ago

Hi Talmo,

Thank you for your reply. I have sent you a Google drive link with the training package via my other email id: @.***

Please let me know if you do not receive it or cannot access it for any reason.

Thank you and with regards,

Omika Wadhwa

On Fri, Jan 14, 2022 at 7:39 PM Talmo Pereira @.***> wrote:

Hi @om-git216 https://github.com/om-git216,

Thanks for posting the detailed training logs. It looks like the training is failing entirely right from the start -- which is highly unusual!

Would you mind sharing your training package (PredictExport Labels Package....Labeled frames)? This will package the images with the SLEAP project file. It might be large, so you can stick it on Google Drive/Dropbox/etc and share the link if necessary.

If you prefer, you can share it privately by emailing it to @.*** .

We'll figure out what's going on here! :)

Cheers,

Talmo

— Reply to this email directly, view it on GitHub https://github.com/murthylab/sleap/issues/628#issuecomment-1013148331, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQ37DPAUL7F55NKEMMPYVTUWAVATANCNFSM5L3OEDTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

talmo commented 2 years ago

Hi @om-git216,

I'm not sure what went wrong that led to the failed training, but I just gave it a spin with the training package you sent me and it seems to have worked fine.

I'm emailing you the trained models if you want to try them out on your end.

I'll close this for now, but please feel free to reply and I'll re-open if you're still having issues.

Cheers,

Talmo

owadhwa commented 2 years ago

Hi @talmo,

Thank you for the trained models.

I tried retraining the same models using the same labeled data package that I sent you, but got a similar error - it says that there is an error while training the centroid. Maybe I'm doing something wrong or missing a package or so?

I have posted the parameters that I have set for training the model:

Screenshot (944)

Screenshot (942)

Screenshot (943)

Best,

Omika

talmo commented 2 years ago

Hi @om-git216,

I don't think it should make such a difference that it would lead to those errors, but can you try setting the sigma to 2.5 for both models, and the anchor part to thorax?

jfrie commented 2 years ago

Hello, I just wanted to add that I just downloaded SLEAP for the first time and I'm getting the same error. It's presenting the same way too (i.e. nan loss values, validations in the corner of image, and if I try to run inference afterward it says no training model exists to run). It may also be helpful to note that I have a 3070 TI and it's taking several minutes to run a single iteration.

talmo commented 2 years ago

Hi @jfrie,

Super helpful, thanks! I should've noticed it in @om-git216's logs, but you're both on 30xx series cards.

Do you mind trying out the prerelease of the next version of SLEAP and mething me know if you're still getting NaNs?

jfrie commented 2 years ago

Hey Talmo,

Thanks for helping out so quickly! I'm now getting this error: image

Here's my terminal output, I apologize in advance for the giant wall of text, I'm not really sure what's important here:

(base) C:\Users\Jude>conda activate sleap

(sleap) C:\Users\Jude>sleap-label Saving config: C:\Users\Jude/.sleap/1.2.0a2/preferences.yaml Restoring GUI state...

Software versions: SLEAP: 1.2.0a2 TensorFlow: 2.7.0 Numpy: 1.19.5 Python: 3.7.11 OS: Windows-10-10.0.19041-SP0

Happy SLEAPing! :) Resetting monitor window. Polling: C:/Users/Jude/Desktop\models\test220119_113316.single_instance.n=25\viz\validation.*.png Start training single_instance... ['sleap-train', 'C:\Users\Jude\AppData\Local\Temp\tmp_yg4vb_x\220119_113316_training_job.json', 'C:/Users/Jude/Desktop/labels_test.slp', '--zmq', '--save_viz', '--video-paths', 'C:/Users/Jude/Desktop/Test Video for SLEAP.mp4'] INFO:sleap.nn.training:Versions: SLEAP: 1.2.0a2 TensorFlow: 2.7.0 Numpy: 1.19.5 Python: 3.7.11 OS: Windows-10-10.0.19041-SP0 INFO:sleap.nn.training:Training labels file: C:/Users/Jude/Desktop/labels_test.slp INFO:sleap.nn.training:Training profile: C:\Users\Jude\AppData\Local\Temp\tmp_yg4vb_x\220119_113316_training_job.json INFO:sleap.nn.training: INFO:sleap.nn.training:Arguments: INFO:sleap.nn.training:{ "training_job_path": "C:\Users\Jude\AppData\Local\Temp\tmp_yg4vb_x\220119_113316_training_job.json", "labels_path": "C:/Users/Jude/Desktop/labels_test.slp", "video_paths": "C:/Users/Jude/Desktop/Test Video for SLEAP.mp4", "val_labels": null, "test_labels": null, "tensorboard": false, "save_viz": true, "zmq": true, "run_name": "", "prefix": "", "suffix": "", "cpu": false, "first_gpu": false, "last_gpu": false, "gpu": 0 } INFO:sleap.nn.training: INFO:sleap.nn.training:Training job: INFO:sleap.nn.training:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": true, "imagenet_mode": null, "input_scaling": 1.0, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": null, "crop_size": null, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 16, "filters_rate": 2.0, "middle_block": true, "up_interpolate": true, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": { "part_names": null, "sigma": 2.5, "output_stride": 2, "offset_refinement": false }, "centroid": null, "centered_instance": null, "multi_instance": null } }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": true, "rotation_min_angle": -15.0, "rotation_max_angle": 15.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": false, "flip_horizontal": true }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 4, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 200, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-08, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "220119_113316.single_instance.n=25", "run_name_prefix": "test", "run_name_suffix": "", "runs_folder": "C:/Users/Jude/Desktop\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://127.0.0.1:9000", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://127.0.0.1:9001" } }, "name": "", "description": "", "sleap_version": "1.2.0a2", "filename": "C:\Users\Jude\AppData\Local\Temp\tmp_yg4vb_x\220119_113316_training_job.json" } INFO:sleap.nn.training: INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: C:/Users/Jude/Desktop/labels_test.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 23 / Validation = 2. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2022-01-19 11:33:20.681597: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-01-19 11:33:20.998668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5461 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3070 Ti, pci bus id: 0000:09:00.0, compute capability: 8.6 INFO:sleap.nn.training:Loaded test example. [1.404s] INFO:sleap.nn.training: Input shape: (1088, 1920, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) INFO:sleap.nn.training: Max stride: 16 INFO:sleap.nn.training: Parameters: 1,953,171 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = SingleInstanceConfmapsHead(part_names=['nose', 'left_ear', 'right_ear'], sigma=2.5, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 544, 960, 3), dtype=tf.float32, name=None), name='SingleInstanceConfmapsHead_0/BiasAdd:0', description="created by layer 'SingleInstanceConfmapsHead_0'") INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 23 INFO:sleap.nn.training:Validation set: n = 2 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: C:/Users/Jude/Desktop\models\test220119_113316.single_instance.n=25 INFO:sleap.nn.training:Setting up visualization... INFO:sleap.nn.training:Finished trainer set up. [1.9s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [2.6s] INFO:sleap.nn.training:Starting training loop... Epoch 1/200 2022-01-19 11:33:26.896011: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201 2022-01-19 11:33:29.533661: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-19 11:33:29.533802: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-19 11:33:29.572528: W tensorflow/core/common_runtime/bfc_allocator.cc:343] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2022-01-19 11:33:31.161271: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-19 11:33:31.161439: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2022-01-19 11:33:41.221072: W tensorflow/core/common_runtime/bfc_allocator.cc:462] Allocator (GPU_0_bfc) ran out of memory trying to allocate 765.00MiB (rounded to 802160640)requested by op model/stack0_dec2_s4_to_s2_skip_concat/concat If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. Current allocation summary follows. Current allocation summary follows. 2022-01-19 11:33:41.221195: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] BFCAllocator dump for GPU_0_bfc 2022-01-19 11:33:41.221965: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (256): Total Chunks: 94, Chunks in use: 93. 23.5KiB allocated for chunks. 23.2KiB in use in bin. 6.8KiB client-requested in use in bin. 2022-01-19 11:33:41.222264: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (512): Total Chunks: 24, Chunks in use: 24. 13.2KiB allocated for chunks. 13.2KiB in use in bin. 11.8KiB client-requested in use in bin. 2022-01-19 11:33:41.222575: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1024): Total Chunks: 9, Chunks in use: 9. 9.2KiB allocated for chunks. 9.2KiB in use in bin. 9.0KiB client-requested in use in bin. 2022-01-19 11:33:41.222876: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2048): Total Chunks: 4, Chunks in use: 4. 13.5KiB allocated for chunks. 13.5KiB in use in bin. 11.8KiB client-requested in use in bin. 2022-01-19 11:33:41.223233: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2022-01-19 11:33:41.223511: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (8192): Total Chunks: 4, Chunks in use: 4. 36.0KiB allocated for chunks. 36.0KiB in use in bin. 36.0KiB client-requested in use in bin. 2022-01-19 11:33:41.223818: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (16384): Total Chunks: 4, Chunks in use: 4. 81.0KiB allocated for chunks. 81.0KiB in use in bin. 72.0KiB client-requested in use in bin. 2022-01-19 11:33:41.224104: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (32768): Total Chunks: 8, Chunks in use: 8. 288.0KiB allocated for chunks. 288.0KiB in use in bin. 288.0KiB client-requested in use in bin. 2022-01-19 11:33:41.224400: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (65536): Total Chunks: 6, Chunks in use: 6. 540.0KiB allocated for chunks. 540.0KiB in use in bin. 504.0KiB client-requested in use in bin. 2022-01-19 11:33:41.224702: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (131072): Total Chunks: 10, Chunks in use: 10. 1.44MiB allocated for chunks. 1.44MiB in use in bin. 1.34MiB client-requested in use in bin. 2022-01-19 11:33:41.224992: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (262144): Total Chunks: 8, Chunks in use: 8. 2.81MiB allocated for chunks. 2.81MiB in use in bin. 2.81MiB client-requested in use in bin. 2022-01-19 11:33:41.225274: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (524288): Total Chunks: 8, Chunks in use: 8. 4.64MiB allocated for chunks. 4.64MiB in use in bin. 4.50MiB client-requested in use in bin. 2022-01-19 11:33:41.225558: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1048576): Total Chunks: 10, Chunks in use: 10. 15.64MiB allocated for chunks. 15.64MiB in use in bin. 15.50MiB client-requested in use in bin. 2022-01-19 11:33:41.225839: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2097152): Total Chunks: 6, Chunks in use: 6. 14.90MiB allocated for chunks. 14.90MiB in use in bin. 12.67MiB client-requested in use in bin. 2022-01-19 11:33:41.226122: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4194304): Total Chunks: 1, Chunks in use: 0. 7.97MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2022-01-19 11:33:41.226402: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (8388608): Total Chunks: 1, Chunks in use: 0. 13.62MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2022-01-19 11:33:41.226685: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (16777216): Total Chunks: 7, Chunks in use: 7. 207.19MiB allocated for chunks. 207.19MiB in use in bin. 199.22MiB client-requested in use in bin. 2022-01-19 11:33:41.226999: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (33554432): Total Chunks: 6, Chunks in use: 6. 358.84MiB allocated for chunks. 358.84MiB in use in bin. 342.66MiB client-requested in use in bin. 2022-01-19 11:33:41.227287: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (67108864): Total Chunks: 5, Chunks in use: 5. 582.22MiB allocated for chunks. 582.22MiB in use in bin. 573.75MiB client-requested in use in bin. 2022-01-19 11:33:41.227570: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (134217728): Total Chunks: 6, Chunks in use: 5. 1.25GiB allocated for chunks. 1.00GiB in use in bin. 956.25MiB client-requested in use in bin. 2022-01-19 11:33:41.227849: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (268435456): Total Chunks: 6, Chunks in use: 4. 2.90GiB allocated for chunks. 1.87GiB in use in bin. 1.87GiB client-requested in use in bin. 2022-01-19 11:33:41.228137: I tensorflow/core/common_runtime/bfc_allocator.cc:1033] Bin for 765.00MiB was 256.00MiB, Chunk State: 2022-01-19 11:33:41.228423: I tensorflow/core/common_runtime/bfc_allocator.cc:1039] Size: 510.00MiB | Requested Size: 510.00MiB | in_use: 0 | bin_num: 20, prev: Size: 382.50MiB | Requested Size: 382.50MiB | in_use: 1 | bin_num: -1, next: Size: 510.00MiB | Requested Size: 510.00MiB | in_use: 1 | bin_num: -1 2022-01-19 11:33:41.228705: I tensorflow/core/common_runtime/bfc_allocator.cc:1039] Size: 543.25MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 510.00MiB | Requested Size: 510.00MiB | in_use: 1 | bin_num: -1 2022-01-19 11:33:41.229044: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 2097152 2022-01-19 11:33:41.229287: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600000 of size 256 next 4 2022-01-19 11:33:41.229604: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600100 of size 256 next 5 2022-01-19 11:33:41.229904: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600200 of size 256 next 6 2022-01-19 11:33:41.230194: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600300 of size 256 next 7 2022-01-19 11:33:41.231549: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600400 of size 256 next 10 2022-01-19 11:33:41.231828: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600500 of size 256 next 11 2022-01-19 11:33:41.232117: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600600 of size 256 next 12 2022-01-19 11:33:41.232396: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600700 of size 256 next 15 2022-01-19 11:33:41.232678: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600800 of size 256 next 8 2022-01-19 11:33:41.233005: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600900 of size 768 next 9 2022-01-19 11:33:41.233289: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600c00 of size 256 next 16 2022-01-19 11:33:41.233582: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600d00 of size 256 next 19 2022-01-19 11:33:41.233859: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600e00 of size 256 next 20 2022-01-19 11:33:41.234141: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11600f00 of size 256 next 21 2022-01-19 11:33:41.234422: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601000 of size 256 next 24 2022-01-19 11:33:41.234717: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601100 of size 256 next 25 2022-01-19 11:33:41.235096: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601200 of size 256 next 28 2022-01-19 11:33:41.235290: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601300 of size 256 next 29 2022-01-19 11:33:41.235577: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601400 of size 256 next 30 2022-01-19 11:33:41.235867: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601500 of size 256 next 31 2022-01-19 11:33:41.236148: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601600 of size 256 next 34 2022-01-19 11:33:41.236426: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601700 of size 256 next 35 2022-01-19 11:33:41.236706: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601800 of size 512 next 38 2022-01-19 11:33:41.236985: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601a00 of size 256 next 39 2022-01-19 11:33:41.237274: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601b00 of size 256 next 40 2022-01-19 11:33:41.237552: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601c00 of size 512 next 43 2022-01-19 11:33:41.237865: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601e00 of size 256 next 44 2022-01-19 11:33:41.238145: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11601f00 of size 256 next 45 2022-01-19 11:33:41.238418: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11602000 of size 1024 next 46 2022-01-19 11:33:41.238710: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11602400 of size 256 next 49 2022-01-19 11:33:41.238990: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11602500 of size 256 next 50 2022-01-19 11:33:41.239270: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11602600 of size 1024 next 53 2022-01-19 11:33:41.239548: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11602a00 of size 512 next 54 2022-01-19 11:33:41.240885: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11602c00 of size 512 next 56 2022-01-19 11:33:41.241237: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11602e00 of size 256 next 58 2022-01-19 11:33:41.241519: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11602f00 of size 256 next 59 2022-01-19 11:33:41.241826: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603000 of size 256 next 61 2022-01-19 11:33:41.242108: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603100 of size 256 next 63 2022-01-19 11:33:41.242393: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603200 of size 256 next 65 2022-01-19 11:33:41.242670: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603300 of size 256 next 66 2022-01-19 11:33:41.243009: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603400 of size 256 next 67 2022-01-19 11:33:41.243301: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603500 of size 256 next 68 2022-01-19 11:33:41.243581: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603600 of size 256 next 71 2022-01-19 11:33:41.243858: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603700 of size 256 next 72 2022-01-19 11:33:41.244150: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603800 of size 256 next 69 2022-01-19 11:33:41.244423: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603900 of size 512 next 70 2022-01-19 11:33:41.244708: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603b00 of size 256 next 76 2022-01-19 11:33:41.244985: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603c00 of size 256 next 79 2022-01-19 11:33:41.245271: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603d00 of size 256 next 80 2022-01-19 11:33:41.245550: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603e00 of size 256 next 85 2022-01-19 11:33:41.245871: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11603f00 of size 256 next 86 2022-01-19 11:33:41.246151: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11604000 of size 256 next 87 2022-01-19 11:33:41.246428: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11604100 of size 256 next 88 2022-01-19 11:33:41.246713: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11604200 of size 256 next 89 2022-01-19 11:33:41.247003: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11604300 of size 256 next 90 2022-01-19 11:33:41.247293: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11604400 of size 256 next 78 2022-01-19 11:33:41.247579: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11604500 of size 3840 next 14 2022-01-19 11:33:41.247859: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11605400 of size 9216 next 13 2022-01-19 11:33:41.248144: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11607800 of size 3840 next 77 2022-01-19 11:33:41.248430: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11608700 of size 768 next 91 2022-01-19 11:33:41.248709: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11608a00 of size 256 next 93 2022-01-19 11:33:41.248989: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11608b00 of size 256 next 94 2022-01-19 11:33:41.249268: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11608c00 of size 256 next 96 2022-01-19 11:33:41.250504: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11608d00 of size 256 next 97 2022-01-19 11:33:41.250841: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11608e00 of size 256 next 98 2022-01-19 11:33:41.251158: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11608f00 of size 512 next 100 2022-01-19 11:33:41.251438: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11609100 of size 512 next 101 2022-01-19 11:33:41.251725: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11609300 of size 768 next 82 2022-01-19 11:33:41.252004: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11609600 of size 3840 next 83 2022-01-19 11:33:41.252303: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160a500 of size 2304 next 84 2022-01-19 11:33:41.252591: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160ae00 of size 1024 next 102 2022-01-19 11:33:41.252877: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160b200 of size 1024 next 104 2022-01-19 11:33:41.253221: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160b600 of size 512 next 107 2022-01-19 11:33:41.253499: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160b800 of size 256 next 109 2022-01-19 11:33:41.253792: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160b900 of size 256 next 111 2022-01-19 11:33:41.254074: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160ba00 of size 256 next 112 2022-01-19 11:33:41.254362: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160bb00 of size 256 next 113 2022-01-19 11:33:41.254651: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160bc00 of size 512 next 114 2022-01-19 11:33:41.254990: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160be00 of size 256 next 115 2022-01-19 11:33:41.255286: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160bf00 of size 256 next 117 2022-01-19 11:33:41.255561: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160c000 of size 256 next 17 2022-01-19 11:33:41.255947: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1160c100 of size 18432 next 18 2022-01-19 11:33:41.256144: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11610900 of size 36864 next 95 2022-01-19 11:33:41.256413: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11619900 of size 36864 next 23 2022-01-19 11:33:41.256701: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11622900 of size 36864 next 22 2022-01-19 11:33:41.256991: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1162b900 of size 36864 next 64 2022-01-19 11:33:41.257270: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11634900 of size 9216 next 92 2022-01-19 11:33:41.257549: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11636d00 of size 27648 next 27 2022-01-19 11:33:41.257844: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1163d900 of size 73728 next 26 2022-01-19 11:33:41.258121: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1164f900 of size 110592 next 62 2022-01-19 11:33:41.258398: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1166a900 of size 184320 next 33 2022-01-19 11:33:41.258684: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11697900 of size 147456 next 32 2022-01-19 11:33:41.259919: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b116bb900 of size 147456 next 60 2022-01-19 11:33:41.260209: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b116df900 of size 147456 next 37 2022-01-19 11:33:41.260487: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11703900 of size 294912 next 36 2022-01-19 11:33:41.260769: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1174b900 of size 739072 next 18446744073709551615 2022-01-19 11:33:41.261097: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 4194304 2022-01-19 11:33:41.261364: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11800000 of size 1280 next 2 2022-01-19 11:33:41.261653: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11800500 of size 256 next 3 2022-01-19 11:33:41.261930: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11800600 of size 294912 next 99 2022-01-19 11:33:41.262212: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11848600 of size 147456 next 110 2022-01-19 11:33:41.262493: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1186c600 of size 147456 next 42 2022-01-19 11:33:41.262777: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11890600 of size 589824 next 41 2022-01-19 11:33:41.263102: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11920600 of size 3013120 next 18446744073709551615 2022-01-19 11:33:41.263396: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 8388608 2022-01-19 11:33:41.263656: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11c00000 of size 1179648 next 48 2022-01-19 11:33:41.263938: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11d20000 of size 589824 next 55 2022-01-19 11:33:41.264223: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11db0000 of size 442368 next 57 2022-01-19 11:33:41.264511: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11e1c000 of size 1327104 next 52 2022-01-19 11:33:41.264803: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b11f60000 of size 2359296 next 51 2022-01-19 11:33:41.265147: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b121a0000 of size 2490368 next 18446744073709551615 2022-01-19 11:33:41.265438: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 16777216 2022-01-19 11:33:41.265708: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12400000 of size 2073600 next 74 2022-01-19 11:33:41.265993: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b125fa400 of size 2073600 next 75 2022-01-19 11:33:41.266272: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b127f4800 of size 2073600 next 81 2022-01-19 11:33:41.266549: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b129eec00 of size 2359296 next 103 2022-01-19 11:33:41.266844: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12c2ec00 of size 1769472 next 105 2022-01-19 11:33:41.267203: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12ddec00 of size 589824 next 106 2022-01-19 11:33:41.267481: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12e6ec00 of size 442368 next 108 2022-01-19 11:33:41.267775: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12edac00 of size 768 next 116 2022-01-19 11:33:41.268056: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12edaf00 of size 9216 next 118 2022-01-19 11:33:41.268329: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12edd300 of size 18432 next 119 2022-01-19 11:33:41.269561: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12ee1b00 of size 256 next 120 2022-01-19 11:33:41.269851: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12ee1c00 of size 36864 next 121 2022-01-19 11:33:41.270133: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12eeac00 of size 256 next 122 2022-01-19 11:33:41.270417: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12eead00 of size 73728 next 123 2022-01-19 11:33:41.270704: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12efcd00 of size 256 next 124 2022-01-19 11:33:41.270991: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12efce00 of size 147456 next 125 2022-01-19 11:33:41.271283: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12f20e00 of size 256 next 126 2022-01-19 11:33:41.271559: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12f20f00 of size 294912 next 127 2022-01-19 11:33:41.271859: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12f68f00 of size 512 next 128 2022-01-19 11:33:41.272148: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12f69100 of size 589824 next 129 2022-01-19 11:33:41.272436: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12ff9100 of size 512 next 130 2022-01-19 11:33:41.272718: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b12ff9300 of size 1179648 next 131 2022-01-19 11:33:41.273006: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13119300 of size 1024 next 132 2022-01-19 11:33:41.273287: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13119700 of size 3041536 next 18446744073709551615 2022-01-19 11:33:41.273574: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 33554432 2022-01-19 11:33:41.273842: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13400000 of size 1024 next 134 2022-01-19 11:33:41.274130: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13400400 of size 1769472 next 135 2022-01-19 11:33:41.274406: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b135b0400 of size 512 next 136 2022-01-19 11:33:41.274686: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b135b0600 of size 589824 next 137 2022-01-19 11:33:41.274986: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13640600 of size 512 next 138 2022-01-19 11:33:41.275267: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13640800 of size 442368 next 139 2022-01-19 11:33:41.275544: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136ac800 of size 256 next 140 2022-01-19 11:33:41.275824: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136ac900 of size 147456 next 141 2022-01-19 11:33:41.276109: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136d0900 of size 256 next 142 2022-01-19 11:33:41.276392: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136d0a00 of size 110592 next 143 2022-01-19 11:33:41.276679: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136eba00 of size 256 next 144 2022-01-19 11:33:41.277022: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136ebb00 of size 36864 next 145 2022-01-19 11:33:41.277320: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136f4b00 of size 256 next 146 2022-01-19 11:33:41.277602: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136f4c00 of size 512 next 147 2022-01-19 11:33:41.278840: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136f4e00 of size 256 next 148 2022-01-19 11:33:41.279205: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136f4f00 of size 768 next 149 2022-01-19 11:33:41.279483: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136f5200 of size 256 next 150 2022-01-19 11:33:41.279790: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136f5300 of size 9216 next 151 2022-01-19 11:33:41.280041: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136f7700 of size 256 next 152 2022-01-19 11:33:41.280322: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136f7800 of size 18432 next 153 2022-01-19 11:33:41.280613: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136fc000 of size 256 next 154 2022-01-19 11:33:41.281003: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b136fc100 of size 36864 next 155 2022-01-19 11:33:41.281280: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13705100 of size 256 next 156 2022-01-19 11:33:41.281557: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13705200 of size 73728 next 157 2022-01-19 11:33:41.281845: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13717200 of size 256 next 158 2022-01-19 11:33:41.282125: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13717300 of size 147456 next 159 2022-01-19 11:33:41.282404: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1373b300 of size 256 next 160 2022-01-19 11:33:41.282685: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1373b400 of size 294912 next 161 2022-01-19 11:33:41.282988: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13783400 of size 512 next 162 2022-01-19 11:33:41.283268: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13783600 of size 589824 next 163 2022-01-19 11:33:41.283551: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13813600 of size 512 next 164 2022-01-19 11:33:41.283837: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13813800 of size 1179648 next 165 2022-01-19 11:33:41.284134: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13933800 of size 1024 next 166 2022-01-19 11:33:41.284413: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13933c00 of size 2359296 next 167 2022-01-19 11:33:41.284704: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13b73c00 of size 1024 next 168 2022-01-19 11:33:41.285008: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13b74000 of size 1769472 next 169 2022-01-19 11:33:41.285296: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13d24000 of size 512 next 170 2022-01-19 11:33:41.285583: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13d24200 of size 589824 next 171 2022-01-19 11:33:41.285859: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13db4200 of size 512 next 172 2022-01-19 11:33:41.286141: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13db4400 of size 442368 next 173 2022-01-19 11:33:41.286429: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e20400 of size 256 next 174 2022-01-19 11:33:41.286710: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e20500 of size 147456 next 175 2022-01-19 11:33:41.287928: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e44500 of size 256 next 176 2022-01-19 11:33:41.288216: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e44600 of size 110592 next 177 2022-01-19 11:33:41.288501: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e5f600 of size 256 next 178 2022-01-19 11:33:41.288800: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e5f700 of size 36864 next 179 2022-01-19 11:33:41.289144: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e68700 of size 256 next 180 2022-01-19 11:33:41.289428: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e68800 of size 512 next 181 2022-01-19 11:33:41.289715: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e68a00 of size 256 next 182 2022-01-19 11:33:41.289991: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e68b00 of size 256 next 183 2022-01-19 11:33:41.290277: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e68c00 of size 256 next 184 2022-01-19 11:33:41.290558: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e68d00 of size 256 next 185 2022-01-19 11:33:41.290853: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e68e00 of size 256 next 186 2022-01-19 11:33:41.291220: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e68f00 of size 256 next 187 2022-01-19 11:33:41.291500: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e69000 of size 256 next 188 2022-01-19 11:33:41.291785: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e69100 of size 256 next 189 2022-01-19 11:33:41.292061: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e69200 of size 256 next 190 2022-01-19 11:33:41.292347: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e69300 of size 256 next 191 2022-01-19 11:33:41.292658: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e69400 of size 256 next 192 2022-01-19 11:33:41.293004: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at b13e69500 of size 256 next 199 2022-01-19 11:33:41.293343: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b13e69600 of size 256 next 202 2022-01-19 11:33:41.293592: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at b13e69700 of size 8355328 next 193 2022-01-19 11:33:41.293875: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b14661500 of size 256 next 194 2022-01-19 11:33:41.294154: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at b14661600 of size 14281216 next 18446744073709551615 2022-01-19 11:33:41.294438: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 67108864 2022-01-19 11:33:41.294709: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b15400000 of size 25067520 next 196 2022-01-19 11:33:41.294995: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b16be8000 of size 42041344 next 18446744073709551615 2022-01-19 11:33:41.295282: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 134217728 2022-01-19 11:33:41.295555: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b19400000 of size 33423360 next 198 2022-01-19 11:33:41.295843: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1b3e0000 of size 25067520 next 201 2022-01-19 11:33:41.296129: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b1cbc8000 of size 75726848 next 18446744073709551615 2022-01-19 11:33:41.297374: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 536870912 2022-01-19 11:33:41.297668: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b21400000 of size 536870912 next 18446744073709551615 2022-01-19 11:33:41.297966: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 1073741824 2022-01-19 11:33:41.298238: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b6ae00000 of size 534773760 next 203 2022-01-19 11:33:41.298567: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b8ac00000 of size 133693440 next 205 2022-01-19 11:33:41.298812: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at b92b80000 of size 267386880 next 207 2022-01-19 11:33:41.299171: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at ba2a80000 of size 137887744 next 18446744073709551615 2022-01-19 11:33:41.299457: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 1073741824 2022-01-19 11:33:41.299718: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bbe600000 of size 267386880 next 208 2022-01-19 11:33:41.300003: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bce500000 of size 133693440 next 209 2022-01-19 11:33:41.300292: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bd6480000 of size 33423360 next 211 2022-01-19 11:33:41.300582: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bd8460000 of size 66846720 next 212 2022-01-19 11:33:41.300879: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bdc420000 of size 66846720 next 213 2022-01-19 11:33:41.301230: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at be03e0000 of size 33423360 next 214 2022-01-19 11:33:41.301508: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at be23c0000 of size 33423360 next 215 2022-01-19 11:33:41.301806: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at be43a0000 of size 33423360 next 216 2022-01-19 11:33:41.302084: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at be6380000 of size 66846720 next 217 2022-01-19 11:33:41.302377: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bea340000 of size 66846720 next 218 2022-01-19 11:33:41.302694: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bee300000 of size 66846720 next 220 2022-01-19 11:33:41.302979: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bf22c0000 of size 204734464 next 18446744073709551615 2022-01-19 11:33:41.303263: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 2775580672 2022-01-19 11:33:41.303523: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at bfe600000 of size 200540160 next 219 2022-01-19 11:33:41.303812: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at c0a540000 of size 133693440 next 221 2022-01-19 11:33:41.304090: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at c124c0000 of size 133693440 next 224 2022-01-19 11:33:41.304377: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at c1a440000 of size 267386880 next 222 2022-01-19 11:33:41.304686: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at c2a340000 of size 401080320 next 223 2022-01-19 11:33:41.305015: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at c421c0000 of size 534773760 next 225 2022-01-19 11:33:41.305301: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at c61fc0000 of size 534773760 next 226 2022-01-19 11:33:41.306533: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at c81dc0000 of size 569638912 next 18446744073709551615 2022-01-19 11:33:41.306840: I tensorflow/core/common_runtime/bfc_allocator.cc:1071] Summary of in-use Chunks by size: 2022-01-19 11:33:41.307220: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 93 Chunks of size 256 totalling 23.2KiB 2022-01-19 11:33:41.307497: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 19 Chunks of size 512 totalling 9.5KiB 2022-01-19 11:33:41.307797: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 5 Chunks of size 768 totalling 3.8KiB 2022-01-19 11:33:41.308095: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 8 Chunks of size 1024 totalling 8.0KiB 2022-01-19 11:33:41.308396: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 1280 totalling 1.2KiB 2022-01-19 11:33:41.308712: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 2304 totalling 2.2KiB 2022-01-19 11:33:41.308985: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 3840 totalling 11.2KiB 2022-01-19 11:33:41.309284: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 4 Chunks of size 9216 totalling 36.0KiB 2022-01-19 11:33:41.309564: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 18432 totalling 54.0KiB 2022-01-19 11:33:41.309846: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 27648 totalling 27.0KiB 2022-01-19 11:33:41.310124: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 8 Chunks of size 36864 totalling 288.0KiB 2022-01-19 11:33:41.310399: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 73728 totalling 216.0KiB 2022-01-19 11:33:41.310696: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 110592 totalling 324.0KiB 2022-01-19 11:33:41.310986: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 9 Chunks of size 147456 totalling 1.27MiB 2022-01-19 11:33:41.311263: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 184320 totalling 180.0KiB 2022-01-19 11:33:41.311546: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 4 Chunks of size 294912 totalling 1.12MiB 2022-01-19 11:33:41.311859: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 4 Chunks of size 442368 totalling 1.69MiB 2022-01-19 11:33:41.312141: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 7 Chunks of size 589824 totalling 3.94MiB 2022-01-19 11:33:41.312425: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 739072 totalling 721.8KiB 2022-01-19 11:33:41.312718: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 1179648 totalling 3.38MiB 2022-01-19 11:33:41.313000: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 1327104 totalling 1.27MiB 2022-01-19 11:33:41.313364: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 1769472 totalling 5.06MiB 2022-01-19 11:33:41.313567: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 2073600 totalling 5.93MiB 2022-01-19 11:33:41.313860: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 2359296 totalling 6.75MiB 2022-01-19 11:33:41.314143: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 2490368 totalling 2.38MiB 2022-01-19 11:33:41.314423: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 3013120 totalling 2.87MiB 2022-01-19 11:33:41.314707: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 3041536 totalling 2.90MiB 2022-01-19 11:33:41.315940: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 2 Chunks of size 25067520 totalling 47.81MiB 2022-01-19 11:33:41.316229: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 5 Chunks of size 33423360 totalling 159.38MiB 2022-01-19 11:33:41.316513: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 42041344 totalling 40.09MiB 2022-01-19 11:33:41.316815: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 5 Chunks of size 66846720 totalling 318.75MiB 2022-01-19 11:33:41.317183: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 75726848 totalling 72.22MiB 2022-01-19 11:33:41.317471: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 4 Chunks of size 133693440 totalling 510.00MiB 2022-01-19 11:33:41.317755: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 137887744 totalling 131.50MiB 2022-01-19 11:33:41.318042: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 200540160 totalling 191.25MiB 2022-01-19 11:33:41.318336: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 204734464 totalling 195.25MiB 2022-01-19 11:33:41.318620: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 2 Chunks of size 267386880 totalling 510.00MiB 2022-01-19 11:33:41.318959: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 401080320 totalling 382.50MiB 2022-01-19 11:33:41.319274: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 2 Chunks of size 534773760 totalling 1020.00MiB 2022-01-19 11:33:41.319566: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 536870912 totalling 512.00MiB 2022-01-19 11:33:41.319850: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 4.03GiB 2022-01-19 11:33:41.320127: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocatedbytes: 5726273536 memorylimit: 5726273536 available bytes: 0 curr_region_allocationbytes: 8589934592 2022-01-19 11:33:41.320415: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats: Limit: 5726273536 InUse: 4331837184 MaxInUse: 5305898752 NumAllocs: 811 MaxAllocSize: 2430337024 Reserved: 0 PeakReserved: 0 LargestFreeBlock: 0

2022-01-19 11:33:41.320762: W tensorflow/core/common_runtime/bfcallocator.cc:474] ********____***____ 2022-01-19 11:33:41.321093: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at concat_op.cc:158 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[4,96,544,960] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "C:\Users\Jude\anaconda3\envs\sleap\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.2.0a2', 'console_scripts', 'sleap-train')()) File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1618, in main trainer.train() File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 892, in train verbose=2, File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4,96,544,960] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node model/stack0_dec2_s4_to_s2_skip_concat/concat (defined at C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\backend.py:3224) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode. [Op:__inference_train_function_6476]

Errors may have originated from an input operation. Input Source operations connected to node model/stack0_dec2_s4_to_s2_skip_concat/concat: In[0] model/stack0_enc1_act1_relu/Relu (defined at C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\backend.py:4867) In[1] model/stack0_dec2_s4_to_s2_interp_bilinear/resize/ResizeBilinear (defined at C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\backend.py:3334) In[2] model/stack0_dec2_s4_to_s2_skip_concat/concat/axis:

Operation defined at: (most recent call last)

File "C:\Users\Jude\anaconda3\envs\sleap\Scripts\sleap-train-script.py", line 33, in sys.exit(load_entry_point('sleap==1.2.0a2', 'console_scripts', 'sleap-train')())

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 1618, in main trainer.train()

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 892, in train verbose=2,

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 1216, in fit tmp_logs = self.train_function(iterator)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 878, in train_function return step_function(self, iterator)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 867, in step_function outputs = model.distribute_strategy.run(run_step, args=(data,))

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 860, in run_step outputs = model.train_step(data)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 808, in train_step y_pred = self(x, training=True)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1083, in call outputs = call_fn(inputs, *args, **kwargs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler return fn(*args, **kwargs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\functional.py", line 452, in call inputs, training=training, mask=mask)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\functional.py", line 589, in _run_internal_graph outputs = node.layer(*args, **kwargs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\engine\base_layer.py", line 1083, in call outputs = call_fn(inputs, *args, **kwargs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler return fn(*args, **kwargs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\layers\merge.py", line 183, in call return self._merge_function(inputs)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\layers\merge.py", line 528, in _merge_function return backend.concatenate(inputs, axis=self.axis)

File "C:\Users\Jude\anaconda3\envs\sleap\lib\site-packages\keras\backend.py", line 3224, in concatenate return tf.concat([to_dense(x) for x in tensors], axis)

INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: C:/Users/Jude/Desktop\models\test220119_113316.single_instance.n=25

talmo commented 2 years ago

Hi @jfrie,

This now seems to be a memory issue. Your video is 1920 x 1088, which is quite large! Here are some solutions:

jfrie commented 2 years ago

Reduced the video size and seems to be working! Thank you so much! Sounds like the new graphics cards may need the pre-release version.

talmo commented 2 years ago

@om-git216: any luck on your end?

talmo commented 2 years ago

Closing this issue due to inactivity, but please feel free to comment again or open a new issue if you're still having problems.

Thanks!

Talmo