lichengunc / MAttNet

MAttNet: Modular Attention Network for Referring Expression Comprehension
http://vision2.cs.unc.edu/refer
MIT License
293 stars 74 forks source link

Replicating MAttNet Online Demo #28

Open albamhp opened 5 years ago

albamhp commented 5 years ago

Running the demo cv/example_demo.ipynb with the original configuration and images from DAVIS2017 challenge and captions, give significantly different results than the Online Demo.

For example, the first frame from girl-dog sequence the following candidates are obtained: image

An example using a black dog as caption is showed below:

ONLINE DEMO: image

Using RefCOCO Pre-trained model weights, the results using cv/example_demo.ipynb with different captions for dog don't seem to change, they always return the girl, while the online demo works fine.

EXAMPLE DEMO:

image

I tried with the weights provided with several captions. RefCOCO+ works for dog but not wheelchair captions, while RefCOCOg works for wheelchair but not dog captions. The results are never the same than those in the Online Demo, which works fine with all captions.

Is there a different configuration available that replicates those results?

lichengunc commented 5 years ago

Yes, I merged several datasets and retrained the model for the demo use. So the output would be different from the released ones here.

carlesventura commented 5 years ago

Would it be possible to release the model that is used in the demo online? If this model is better than the ones released, it would be benefitial for the comunity.