microsoft / CameraTraps

PyTorch Wildlife: a Collaborative Deep Learning Framework for Conservation.
https://cameratraps.readthedocs.io/en/latest/
MIT License
770 stars 243 forks source link

is this error means my gpu memory_size is small #283

Closed VYRION-Ai closed 2 years ago

VYRION-Ai commented 2 years ago

is this error means my gpu memory_size is small [00:00<?, ?it/s]2022-04-25 01:31:03.333175: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -1204 } dim { size: -1205 } dim { size: -1206 } dim { size: 1088 } } } inputs { dtype: DT_FLOAT shape { dim { size: -25 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -25 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } value { dtype: DT_INT32 tensor_shape { dim { size: 2 } } int_val: 17 } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce GTX 1050 Ti" frequency: 1392 num_cores: 6 environment { key: "architecture" value: "6.1" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 1048576 shared_memory_size_per_multiprocessor: 98304 memory_size: 2917584078 bandwidth: 112128000 } outputs { dtype: DT_FLOAT shape { dim { size: -25 } dim { size: 17 } dim { size: 17 } dim { size: 1088 } } }

agentmorris commented 2 years ago

Sorry, I've never seen that error before, but FWIW I'm about 96% sure it's not a memory size issue. The best recommendation I have is the one I suggested on the previous issue, where I recommended following our GitHub instructions (rather than the Colab) for non-Colab execution.

In particular, based on the code you posted in the other issue, the approach you were using imports TensorFlow in a parent process like this:

import tensorflow as tf

...then (by calling our code) imports TensorFlow in a child process like this:

import tensorflow.compat.v1 as tf

...and I have no idea what the implications of that are, but it's an unusual setup that will make debugging difficult.

Maybe try running run_detector_batch.py directly, rather than using the approach from the Colab?

Actually one other suggestion: this also feels a little like somehow the MegaDetector .pb file got corrupted. This would also be addressed by following the instructions for offline execution, rather than downloading from within code as the Colab does, but just in case, confirm that your .pb file is 245,590,501 bytes?

Sorry we don't have a quick fix!

-Dan

VYRION-Ai commented 2 years ago

@agentmorris the code was working using CPU but after i install TensorFlow GPU i got this error now please how i can run the code on cpu?

agentmorris commented 2 years ago

On line 75 of run_detector_batch.py, change this:

force_cpu = False

...to:

force_cpu = True

Again I'm not 100% sure how this will behave in the context where you do stuff in TensorFlow in the parent process (which will be using TF2 and may be using the GPU), so I still recommend against this approach, even if you just want to use the CPU (the Colab code was really written for Colab).

But I'm... 65% sure this will allow you to run on the CPU.

Hope that helps!

VYRION-Ai commented 2 years ago

@agentmorris thank you , CPU works well now i will search why gpu makes error thank you

matobler commented 2 years ago

Just as a note, I have had memory overflow issues with a GPU with 4GB of memory for high-resolution images. The same setup worked fine for smaller images. If your images are very large you might want to resize them and try again, just to rule out GPU memory as a possible issue.

VYRION-Ai commented 2 years ago

@matobler , thank you very much, i will try and write what happened,

VYRION-Ai commented 2 years ago

@matobler
sorry , download the size of image didn't solve the problem . i resize it to 160*160( no thing)