zanilzanzan / FuseNet_PyTorch

Joint scene classification and semantic segmentation with FuseNet
GNU General Public License v3.0
108 stars 34 forks source link

Ask meaning of 'Org.' #3

Closed shangweihung closed 6 years ago

shangweihung commented 6 years ago

In this file "FuseNet_Class_Plots.ipynb": it shows up the below information. "" Best global pixel-wise accuracy: 0.777 mean class-wise IoU accuracy: 0.273 mean accuracy: 0.479 Org. global pixel-wise accuracy: 0.763 mean class-wise IoU accuracy: 0.373 mean accuracy: 0.483 ""

I am wondering what "Best" and "Org." mean here? especially for "Org.".

does that mean the model your group train and the result you get by your own tricks and methods proposed in the paper?

hazirbas commented 6 years ago

Best refers to the experiments with hyper-parameter tuning while orig. refers to the reported results in the paper :D

shangweihung commented 6 years ago

Hi,

I am wondering about the hyper-parameters and any other training tricks that help you get the orig. model that get 0.373 IoU. I only got about 0.27-0.28 IoU , following your given hyper-parameters in the paper. By the way, can you tell me how many iterations you train the 'orig' model with batchsize 4 mentioned in the paper. Moreover, can you release ' sunrgbd1_db.h5 ' file?

I am looking forward to your kind reply since I am working on SUN RGB-D datasets now.

Thank you so much=D

Shang-Wei

2018-08-01 20:09 GMT+08:00 Caner notifications@github.com:

Best refers to the experiments with hyper-parameter tuning while orig. refers to the reported results in the paper :D

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zanilzanzan/FuseNet_PyTorch/issues/3#issuecomment-409552940, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuZvycD1HuD3QU9zgOqSzfRLdJGcLeKks5uMZpjgaJpZM4VqYAM .

hazirbas commented 6 years ago

Maybe @zanilzanzan can help you with these.

zanilzanzan commented 6 years ago

Hey @shangweihung,

I got around 0.262 IoU accuracy myself on SUN RGB-D benchmark with the originally proposed baseline model SF-5 (see the table below) using the PyTorch implementation.

untitled

However, I used the whole SUN RGB-D dataset for the experiments. It's mentioned on the paper that some of the images have not been included in the experiments: "In the experiments we used the standard training and test split with in-painted depth images. However, we excluded 587 training images that are originally obtained with RealSense RGB-D camera. This is due to the fact that raw depth images from the aforementioned camera consist of many invalid measurements, therefore in-painted depth images have many false values." That could be the actual trick to obtain higher IoU prediction accuracy with the hyper-parameters provided in the paper. So, I would recommend you to do the same, and if you may have further issues regarding this I'd recommend you to switch to the Caffe-implementation repository and pose your question there.

Apart from that, during the weekend I'll try to provide the .h5 file for SUN RGB-D as well. I'll inform you here again.

Cheers, Anil

shangweihung commented 6 years ago

Hi Anil, Thank you for replying me in real time. I see. So org. model means the dataset without those 587 training image. I guess it would definitely be the key point here.

By the way, thank again for the .h5 file.

Warm regards,

Shang-Wei

zanilzanzan notifications@github.com於 2018年8月1日 週三,下午8:45寫道:

Hey @shangweihung https://github.com/shangweihung,

I got around 0.262 IoU accuracy myself on SUN RGB-D benchmark with the originally proposed baseline model SF-5 (see the table below) using the PyTorch implementation.

[image: untitled] https://user-images.githubusercontent.com/10849007/43522373-6ee1f2da-9599-11e8-84b8-a42f64b3bdfa.png

However, I used the whole SUN RGB-D dataset for the experiments. It's mentioned on the paper that some of the images have not been included in the experiments: "In the experiments we used the standard training and test split with in-painted depth images. However, we excluded 587 training images that are originally obtained with RealSense RGB-D camera. This is due to the fact that raw depth images from the aforementioned camera consist of many invalid measurements, therefore in-painted depth images have many false values." That could be the actual trick to obtain higher IoU prediction accuracy with the given hyper-parameters provided in the paper. So, I would recommend you to do the same, and if you may have further issues regarding this I'd recommend you to switch to the Caffe-implementation repository and pose your question there.

Apart from that, during the weekend I'll try to provide the .h5 file for SUN RGB-D as well. I'll inform you here again.

Cheers, Anil

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/zanilzanzan/FuseNet_PyTorch/issues/3#issuecomment-409561961, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuZv09KGZDfUMi63R4Q3dNYomOU7-uLks5uMaLngaJpZM4VqYAM .

zanilzanzan commented 6 years ago

Thank you for your interest, @shangweihung! As you said I honestly think that excluding those images is the essential point here, considering the inconsistencies mentioned in the paper. If you have any further questions please do not hesitate to contact again.

Have a nice day! Anil

shangweihung commented 6 years ago

Hi, sorry to bother you over the weekend. and again thank you for providing sun.h5 file. I have a small problem and want to make sure now. Your input image size of NYU or SUN dataset is 320240, I notice that your training image will be all resized to 224224, but I did not see anything about resizing in the code. Or there is something I misunderstand. Could you answer me about the size?

Sincerely, Shang-Wei

[image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality6& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality6& 18/08/04 下午5:04:14

2018-08-01 21:00 GMT+08:00 zanilzanzan notifications@github.com:

Thank you for your interest, @shangweihung https://github.com/shangweihung! As you said I honestly think that excluding those images is the essential point here, considering the inconsistencies mentioned in the paper. If you have any further questions please do not hesitate to contact again.

Have a nice day! Anil

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zanilzanzan/FuseNet_PyTorch/issues/3#issuecomment-409566333, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuZv8jluo_wfuUYDpvGQZXjFNd97sstks5uMaaEgaJpZM4VqYAM .

zanilzanzan commented 6 years ago

Hi @shangweihung, we do not resize the images again before feeding them to the network. Although the default input size for VGG16 is 224x224, since it consists of convolutional layers it is able to process inputs of various sizes (see this post and this one).

shangweihung commented 6 years ago

HI, I see. In other words, VGG 16 model has the ability to deal with different input size, though the default input size for it is 224X224. In your case just feed your 320*240 images into the network. Am I correct?

Besides, I have come out with one more question. I notice "num_classes" in the code for NYU dataset is 10, do you know how many classes are for SUN dataset?

Sincerely shang-wei,

[image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality6& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality6& 18/08/04 下午5:59:16

2018-08-04 17:49 GMT+08:00 zanilzanzan notifications@github.com:

Hi @shangweihung https://github.com/shangweihung, we do not resize the images again before feeding them to the network. Although the default input size for VGG16 is 224x224, since the it consists of convolutional layers it is able to process inputs of various sizes (see this post http://forums.fast.ai/t/vgg16-change-size-from-224-224-or-change-images-size-256-256/5545 and this one https://stackoverflow.com/questions/41903051/change-input-tensor-shape-for-vgg16-application ).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zanilzanzan/FuseNet_PyTorch/issues/3#issuecomment-410437721, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuZv3gIudt2twDamot5qVKfkmCDyt6fks5uNW5AgaJpZM4VqYAM .

Aleck16 commented 6 years ago

Hi @shangweihung. Can your accuracy achieve the accuracy of "Org" on the NYUv2 dataset? I am currently experiencing the same problem as you, I only got about 0.27-0.28 IoU.

Aleck16 commented 6 years ago

Hi @zanilzanzan. In the answer given above, which article is the experimental comparison result? I want to experiment with the code of "Org" precision.

zanilzanzan commented 6 years ago

Ok, maybe I should briefly clear out the distinction between the model referred to as the "original" and the model presented in this repository. The original FuseNet architecture was proposed in this paper and the model was initially implemented in Caffe, which can be found in the corresponding repository. Therefore the results that you see in the paper were obtained with the Caffe implementation. This repository, however, includes the PyTorch implementation of the originally proposed model and the extension of it, which makes use of the additional scene class information. The result comparison you come across is made between these models. Does this clear out some question marks you have?

Aleck16 commented 6 years ago

Ok, thank you for your answer.