I cannot reproduce the results

microsoft / transductive-vos.pytorch

a transductive approach for video object segmentation

155 stars 32 forks source link

Hi, thanks for releasing the codes and I run your code completely but the results cannot be reproduced at all.

I use DAVIS 2017 as my training set, and evaluate the checkpoints on DAVIS 2017 validation set.

However the results are:

G/J/F: 67.2/65.2/69.2

But in your paper:

G/J/F: 72.3/69.9/74.7

I notice you use 4 GPUS with 16GB memory each and here I only has 4 GPUs with 11GB memory each. I think the hardware difference should NOT make such a significant difference. Could you please have an explanation on that because there are a few fellows issuing the same problems below.

Thanks in advance.

And what's more, your paper says it is a transductive method, however, your codes are TOTALLY different with the equations in your paper, and the masks are not semi-supervised, it is fully supervised by cross-entropy loss.

Please explain the issue which I think it is an essential problem in your paper.

Hello,

We have contacted you vial email and provided our recent training log, and our advice are as follows:

Make sure you are able to reproduce our paper's result using our pretrained model, which is linked in github repository.
With 4 11GB GPUs, if you can't train with default batch size (16), you have to adjust parameters accordingly, eg. reduce lr.
Check Pytorch and cuda version, Pytorch 1.0 and cuda 10.0 are recommended.

As for your question with our paper, we suggest you go to https://davischallenge.org/challenge2017/index.html and read related papers. Our learning method during training time is supervised by the mask annotations from all frames in all videos. The problem is semi-supervised, because during inference time, only the annotation of the first frame is given. All VOS papers in semi-supervised literature follow this setting.

Thank you for your interest.

microsoft / transductive-vos.pytorch

I cannot reproduce the results #19