Open kittles opened 6 years ago
So you dont see a line like this during training:
1/1000 [=>.................................................] - ETA: 12:45 - loss: 1.3515 - rpn_class_loss: 0.1585 - rpn_bbox_loss: 0.4085 - mrcnn_class_loss: 0.1654 - mrcnn_bbox_loss: 0.2975 - mrcnn_mask_loss: 0.3216
? If not it means it never starts training
no, i do see lines like that. it does train, just super slow
On Mon, Jun 25, 2018 at 12:25 AM Magnus Reiersen notifications@github.com wrote:
So you dont see a line like this during training:
1/1000 [=>.................................................] - ETA: 12:45
- loss: 1.3515 - rpn_class_loss: 0.1585 - rpn_bbox_loss: 0.4085 - mrcnn_class_loss: 0.1654 - mrcnn_bbox_loss: 0.2975 - mrcnn_mask_loss: 0.3216
? If not it means it never starts training
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/matterport/Mask_RCNN/issues/712#issuecomment-399857401, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd_1lkEcKw-xAQ5Jkc0TeXWb__rb-DMks5uAJBtgaJpZM4U07bu .
Download Tech Powerup GPU-Z, in settings, go to update time and set it to 0.1. Try to see how the GPU is utilzed over time. Take a screenshot after 3 iterations and add the screenshot here
I met the same , my config set gpu as 8 , but when I was training , the gpu didn't use, but cpu was highly used. how to solve?
Could you post the results of running pip list
?
I would assume you have tensorflow installed instead of tensorflow-gpu, which is an error in the requirements.txt of this repo.
you need to install tensorflow-gpu instead of tensorflow
I know it is a bit late but if someone is struggling with the problem: You can check it this way from keras import backend as K K.tensorflow_backend._get_available_gpus()
or
from tensorflow.python.client import device_lib print(device_lib.list_local_devices())
if no gpu is detected, then you have to pip3 install tensorflow-gpu
then do the line above and you should see [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 13043076236058885011 , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 15906157833526132886 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 9907085518476589959 physical_device_desc: "device: XLA_CPU device" , name: "/device:GPU:0" device_type: "GPU" memory_limit: 15702143796 locality { bus_id: 1 links { } } incarnation: 15516049900956998810 physical_device_desc: "device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:03:00.0, compute capability: 6.0" ]
Hi,Below are the details [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 3810402909033317137 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 9121682555 locality { bus_id: 1 links { } } incarnation: 7365749146294383826 physical_device_desc: "device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5" ]
still it is running slow
Are there still no solutions ? I'm also having the same problem
I think there is issue if you run training on rtx 2080 ti with cuda 9. I am facing the same issue
Tensorflow = 1.9.0 Cuda = 9.0 Cudnn = 7.0.5 python = 3.5.2 gpu = rtx 2080 ti
I think tensorflow does not support gpu utilization when running cuda 9 on rtx gpu's. Can someone please help?
I ended up changing to Yolo as my application did not require me to use Mask R-CNN.
On Thu, 22 Aug 2019 at 07:00, Deval Shah notifications@github.com wrote:
I think there is issue if you run training on rtx 2080 ti with cuda 9. I am facing the same issue
Tensorflow = 1.9.0 Cuda = 9.0 Cudnn = 7.0.5 python = 3.5.2 gpu = rtx 2080 ti
I think tensorflow does not support gpu utilization when running cuda 9 on rtx gpu's. Can someone please help?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matterport/Mask_RCNN/issues/712?email_source=notifications&email_token=AIUMG77LSUXF3TQVKEEK5R3QFYMQDA5CNFSM4FGTW3XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD434H6Y#issuecomment-523748347, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUMG73KWDWXD2B3UZKPS3LQFYMQDANCNFSM4FGTW3XA .
hello! thanks for the libraries, i'd be stuffed if i tried doing any of this without them! i am trying to train coco to recognize a new single class, so i made a dataset with polygon annotations and set up a script like balloon.py example. after a bunch of twiddling, i can get it to start actually training, but its really slow and i think its only using the cpu. i know this question comes up alot, so i want to assure people that its not because im using tensorflow instead of tensorflow-gpu. when i run the balloon.py example, i can see the gpu working hard. when i run my own script, i see tensorflow grabs all the memory, as well as logging that its using the gpu, but the gpu never ends up doing work during the training. ive included some of the relevant details below:
the training file:
and some of the output that might be relevant? this is all the output before it starts actually training.
finally, once its training, it takes about a minute per step, and here is what task manager shows for hardware usage:
thank you for making it this far! i hope i've included all the relevant info- any advice or even shots in the dark are appreciated!