matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.58k stars 11.69k forks source link

Mask RCNN demo output is nowhere near accurate and completely random #2670

Open AydinGokce opened 3 years ago

AydinGokce commented 3 years ago

Hi Everyone,

I cloned the repo and ran the demo.ipynb (https://github.com/matterport/Mask_RCNN/blob/master/samples/demo.ipynb) in the samples folder. I didn't experience any issues until the last cell, where it stalled for an oddly long amount of time like ten minutes and spit out this image: image

I can't conceive what's wrong here as I didn't touch the code, I just cloned the repository and ran the demo to check if everything was working. Tensorflow is definitely using my GPU (3070). My only guess is that the source of the problem is incompatible package versions.

My most relevant package versions are as follows:

tensorflow 1.15 keras 2.1.6 cudatoolkit 10.0.130 cudnn 7.6.5 cuda proprietary driver 460

Here is a list of all my packages if you want to see more: packages.txt

If anyone has any ideas of what could be causing this, or notices a package incompatibility which could be the cause of my issue, I would appreciate suggestions.

Thanks!

PandaPandaChen commented 3 years ago

I met the same issue

Nietram commented 3 years ago

Same here too, I've got nice result until yesterday, i'm using colab so I was thinking about a package version problem

image image

Marium-E-Jannat commented 3 years ago

I am also using the Colab and facing exactly the same as @Nietram from the day before yesterday. I have no clue why this is happening.

If anyone could provide any solutions or suggestions, that would be really helpful.

Marium-E-Jannat commented 3 years ago

Hey @Nietram, currently colab is using tensorflow 2.6. I think 2.6 has some package version problem with keras.model.predict(). So I downgraded to tensorflow 2.4 on colab and now it is working fine like before.

nomurakeiya commented 3 years ago

@Marium-E-Jannat Hi I also met same error. It happened after change Tensorflow version 24. to 2.6. As you say, I think there is a problem Tensorflow version. Can anyone solve the problem and tell me where to change? image

Now I'm try to down Tensorflow version. But, When I was using Tensorflow 2.4.1, training error happened. So,I'll change version to Tensorflow 2.5. https://github.com/tensorflow/tensorflow/issues/37543

nomurakeiya commented 3 years ago

@Marium-E-Jannat I don't know why BUT It fixed on Tensorflow 2.5!!

image

Nietram commented 3 years ago

Hi, thanks a lot @Marium-E-Jannat I've done some trials on tf 2.5 before your message but don't remembered what was my package version before your message. So on TF 2.4 it's work again perfectly. On tf 2.5 I've got step that taken 6000% more time than on 2.4.

Thanks again

devdimit93 commented 3 years ago

Problem is in model.detect function. You can train your model correctly with tf 2.6.0 but model.detect vreturn incorrect results. model.detect is used only for final prediction and visualisation process and is not used in training process.

Pretrained with tf 2.6.0 model will work properly with previous tf versions.

I trained model on tf 2.6.0 in Collab, reset environment, installed tf 2.4.0 and downloaded pretrained weights in my model. Results were displayed correctly.

Commands for Google Collaboratory

!pip install tensorflow==2.4.0
!pip install tensorflow-gpu==2.4.0

restart system after installation

check your results with code

import tensorflow as tf
tf.__version__
from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return local_device_protos
print(get_available_gpus())

you will see something like this. If you dont see GPU, try another version tf. Manualy installed 2.5.0 version did not see GPU correctly. At least in my case.


2.4.0
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 16755997775038254441
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 14646682624
locality {
  bus_id: 1
  links {
  }
}
incarnation: 1885398783467547466
physical_device_desc: "device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5"
]
mcthesw commented 2 years ago

I met the same problem when i was using tf 2.6.0 I switched onto tf 2.5 and it works

hurueilin commented 2 years ago

I can finally run demo.ipynb successfully after days of trying. Hope this can help others. I'm using Windows10 and GTX 1060.

Installation steps:

conda create --name TF1.15 python=3.7

conda install cython
pip install opencv-python
conda install -c anaconda pillow
conda install -c anaconda scikit-image
conda install imgaug

Install CUDA10.0(cuda_10.0.130_411.31_win10) & cudnn(v7.4.2) from NVIDIA website
conda install cudatoolkit=10.0 
conda install cudnn=7.6.5 

pip install h5py==2.10.0
pip install tensorflow==1.15.0
pip install tensorflow-gpu==1.15.0
pip install keras==2.1.6

(Install pycocotools)
pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

(cd to Mask_RCNN folder)
python setup.py install

conda install -c anaconda ipykernel
KatharinaSchmidt1 commented 2 years ago

I am facing the same issues as described above with tf version 2.7.0 Did someone already found the code inside model.detect([image], verbose=1) which cause these errors?

Moe03 commented 1 year ago

Still facing this issue, tried downgrading to 2.5.0 but then other compatibility errors occur..

avinash-218 commented 1 year ago

The ‘model.load_weights’ seem to load the weights incorrectly due to version compatibility issues, resulting in training from scratch. So during training and evaluation, the coco and the previously trained weights were not loaded properly and hence in case of training, the training happens from scratch and in case of evaluation the loaded model predicts worse on the sample data.

Because of this reason, the losses at earlier steps at earlier epochs were too high and also the visual results looked random, not even close to the ground truth, and also the evaluation metrics such as mAP, mAR, F1 were 0. This can be solved by two ways : By using ‘tf.keras.Model.load_weights’ instead of ‘model.load_weights’ - But still this can’t be used since it doesn’t support the ‘exclude’ argument.

By downgrading tensorflow from 2.7 to 2.5 worked in both training (from coco using exclude argument and from previously trained weights) and also in evaluations.

This worked for me. Correct me if i am wrong or my understanding is wrong

dholukeval commented 1 year ago

I faced the same problem. Creating a new conda environment with Tensorflow 2.5 (with python 3.8) worked for me. Here is the way to create new environment in conda with Tensorflow 2.5 link

Cheers!

ozgur-kurt commented 1 month ago

I faced the same problem. Creating a new conda environment with Tensorflow 2.5 (with python 3.8) worked for me. Here is the way to create new environment in conda with Tensorflow 2.5 link

Cheers!

Which requirements.txt did you install ? Original one with matterport ?

dholukeval commented 1 month ago

I faced the same problem. Creating a new conda environment with Tensorflow 2.5 (with python 3.8) worked for me. Here is the way to create new environment in conda with Tensorflow 2.5 link Cheers!

Which requirements.txt did you install ? Original one with matterport ?

After installing TensorFlow with python 3.8, I installed the other dependencies manually. I did not use requirements.txt as far as I remember.

avinash-218 commented 1 month ago

I faced the same problem. Creating a new conda environment with Tensorflow 2.5 (with python 3.8) worked for me. Here is the way to create new environment in conda with Tensorflow 2.5 link Cheers!

Which requirements.txt did you install ? Original one with matterport ?

Try these numpy==1.19.2 scipy Pillow cython matplotlib scikit-image==0.16.2 tensorflow==2.5.0 keras==2.7.0 opencv-python h5py==3.1.0 imgaug