Closed momo1986 closed 4 years ago
Thank you for reporting the problem!
The program runs very slowly.
This is because some behavior changes from tensorflow-1.15.2 to tensorflow-1.15.4. Current code use PIL to load Image, which does not play well with newer version of tensorflow. Current code is tested under tensorflow-1.15.2. I bumped the version recently due to some security problem in old version of tensorflow. Sorry for the inconvenience.
You could downgrade your tensorflow to 1.15.2, or you could wait me for a simple fix soon.
What others should I do when I run gpu program based on ImageNet.
Some ImageNet models are quite large, you might need big enough GPU memory (~10G).
Thank you for reporting the problem!
The program runs very slowly.
This is because some behavior changes from tensorflow-1.15.2 to tensorflow-1.15.4. Current code use PIL to load Image, which does not play well with newer version of tensorflow. Current code is tested under tensorflow-1.15.2. I bumped the version recently due to some security problem in old version of tensorflow. Sorry for the inconvenience.
You could downgrade your tensorflow to 1.15.2, or you could wait me for a simple fix soon.
What others should I do when I run gpu program based on ImageNet.
Some ImageNet models are quite large, you might need big enough GPU memory (~10G).
Hello, @Fugoes Fu.
Since "Loading tf Imagenet-pretrained model" nees a lot of time and memory, is there method to see the schedule of loading, also to see whether the loading is blocked or failed.
I work on 2080ti with 10.8G memory, perhaps the effieciency to run the program is important when using realsafe.
Could you give some tips?
Thanks & Regards! Momo
Because sometimes the program stopped at:
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-10-13 03:34:25.210964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-10-13 03:34:25.211126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-13 03:34:25.211159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-10-13 03:34:25.211179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-10-13 03:34:25.983325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9929 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:08:00.0, compute capability: 7.5)
Since without further log, it will not convenient for debuging.
Thanks & Regards!
Since without further log, it will not convenient for debuging.
This behavior is quite strange. Does the NVIDIA driver works fine? Does sudo dmesg
print errors about NVIDIA?
totalMemory: 10.76GiB freeMemory: 10.21GiB
It is computation power I can manipulate, is it hardworking to load tensorflow's InceptionV3 and Ensemble InceptionV3 on ImageNet?
Here is Nvidia Log:
2020-10-13 04:02:41.126940: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2020-10-13 04:02:41.142369: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2099940000 Hz 2020-10-13 04:02:41.146588: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x555a0826bdc0 executing computations on platform Host. Devices: 2020-10-13 04:02:41.146661: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0):
, 2020-10-13 04:02:42.281625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:08:00.0 totalMemory: 10.76GiB freeMemory: 10.21GiB 2020-10-13 04:02:42.281688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-10-13 04:02:42.282785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-10-13 04:02:42.282812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-10-13 04:02:42.282823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-10-13 04:02:42.282930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9929 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:08:00.0, compute capability: 7.5) 2020-10-13 04:02:42.286556: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x555a08acead0 executing computations on platform CUDA. Devices: 2020-10-13 04:02:42.286614: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 WARNING:tensorflow:From /usr/local/miniconda3/envs/dl10/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/local/miniconda3/envs/dl10/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. 2020-10-13 04:02:55.418225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-10-13 04:02:55.418367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-10-13 04:02:55.418398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-10-13 04:02:55.418418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-10-13 04:02:56.162556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9929 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:08:00.0, compute capability: 7.5)
totalMemory: 10.76GiB freeMemory: 10.21GiB
It is computation power I can manipulate, is it hardworking to load tensorflow's InceptionV3 and Ensemble InceptionV3 on ImageNet?
Should work fine with a reasonable batch size (~50 or 100).
Current batch-size is 10, is it not suitable? @Fugoes
Current batch-size is 10, is it not suitable? @Fugoes
Should work fine for all ImageNet pre-trained models in RealSafe.
Thank you for reporting the problem!
The program runs very slowly.
This is because some behavior changes from tensorflow-1.15.2 to tensorflow-1.15.4. Current code use PIL to load Image, which does not play well with newer version of tensorflow. Current code is tested under tensorflow-1.15.2. I bumped the version recently due to some security problem in old version of tensorflow. Sorry for the inconvenience.
You could downgrade your tensorflow to 1.15.2, or you could wait me for a simple fix soon.
What others should I do when I run gpu program based on ImageNet.
Some ImageNet models are quite large, you might need big enough GPU memory (~10G).
I fix this performance degradation when loading dataset in https://github.com/thu-ml/realsafe/commit/39f632e950562fa00ac26d34d13b2691c9c5f013. Check the commit message for why it happens :).
Thanks, I also find some cues that it works fine when I run "python test_imagnet_models.py" alone but blocked when I run"python test_imagenet_models.py | tee debug.log". Looks the pipe operation are restricted. Is there any work-around to store the running-log when run big models in realsafe?
Thanks & reagds!
Thanks, I also find some cues that it works fine when I run "python test_imagnet_models.py" alone but blocked when I run"python test_imagenet_models.py | tee debug.log". Looks the pipe operation are restricted. Is there any work-around to store the running-log when run big models in realsafe?
When use python with pipe, python would enable buffered output for both stdout and stderr. To disable this behaviour, use python -u
instead.
Hello, @Fugoes Fu.
Thanks for your tips.
Also, what tensorflow-version is recommended for realsafe, maybe it is better to give an official version range.
Thanks & Regards!
Thanks for your tips.
You are welcome.
Also, what tensorflow-version is recommended for realsafe, maybe it is better to give an official version range.
Thank for your advice. We suggest tensorflow>=1.13 (in the README.md
).
I will close the issue.
Hello, dear guys from thu-ml.
Thanks for your program.
I try to run test_imagenet_model.py for imagenet datasets with L∞ Attack.
The program runs very slowly.
I notice the requirement.txt set "tensorflow=1.15.4", not the "tensorflow-gpu=1.15.4".
I install with tensorflow-gpu 1.15.4 and make sure tensorflow.test.is_gpu_available() 's value TRUE.
I also notice:
My question is:
1) What others should I do when I run gpu program based on ImageNet.
2) I add adversarial attack code in: https://github.com/thu-ml/realsafe/blob/master/realsafe/dataset/imagenet.py
Before "if clip" and after " img = img.convert(mode='RGB')" is my modification, would this modification before tensor conversion and placeholder filling slow down the operation speed?
Thanks & Regards! Momo