StyleNet: Non Reproducible Results

tusharkr commented 4 years ago

I would like to categorically state that this Paper "StyleNet: Generating Attractive Visual Captions with Styles" from Microsoft is non-reproducible. This is not just from the code based on this repo, but our own extensive experiments have lead us to believe that this paper is just a work of fiction put together. We have also contacted the lead authors Chuang Gan & Zhe Gan. However, we did not get any reasonable explanation about why this architecture does not work. It is unfortunate to see that this paper also have significant citations. At this point, how this was accepted at CVPR remains a big question. Also the new dataset as mentioned in the paper, is not available as a whole. Only a part of this dataset is available, which makes this task even more questionable. Overall, I would request readers stumbling across this not to waste their time reproducing this paper!!

WuRong1997 commented 4 years ago

Thank you for telling us. I just decide to start my work based on StyleNet and try to reproduce it. You help me save my time.

njucckevin commented 3 years ago

Thanks for you remind. But could you explain why you think the dataset is not available? I took a look at the dataset and it seems no problem, maybe you think it's difficult to distinguish the romantic and humorous?

tusharkr commented 3 years ago

The dataset should contain 10k images according to the paper. However, in reality only 7k images are available. We confronted the author regarding this and he did not give any specific reason as to why the 3k is missing. Moreover, there are 3 captions per image for the neutral captions whereas there is only 1 caption per image for humorous. This makes the training impossible. This is why I have categorically stated that this paper is just a work of fiction.

njucckevin commented 3 years ago

I got it, thank you.

tusharkr commented 3 years ago

you are welcome. I have spent close to 6 months trying to reproduce this paper. After asking a couple of confronting questions, the authors stopped responding. I would suggest not to waste your time on this or any similar paper written by the the first-author of this paper.

Doragd commented 3 years ago

First of all, I want to point that this repo is not official repo. Actually, there is so much work following this paper, which focuses on limited stylized pair data by unpair training.

tusharkr commented 3 years ago

Two points, Firstly, It does not matter if this is the official repo or not, technically the paper is non-reproducible and the architecture simply does not work. Since this link is where most researchers stumble on (in fact I have a mail where the second author himself asked me to try this repo), it is good to tell them in advancethat not only this repo, but the paper itself is non-reproducible. Secondly, just because there are others who are inspired from this design (or that there are other papers referring to this paper) does not necessarily guarantee the reproducibility of this paper.

Doragd commented 3 years ago

Thanks for your quick and kind reply. I am devoting myself to reproduce this paper, at least reproducing the performance on the 7k limited data that is now public. This result has been made by MSCap, CVPR19'.

Doragd commented 3 years ago

YOU ARE RIGHT! I also doubt the result in this paper. The following picture is my rough result with respect to the romantic style. By the way, I think I have fixed some bugs in this repo.

tusharkr commented 3 years ago

Good to know that you are trying to reproduce, However, from our side, we fixed all the bugs in this repo. We also wrote the code from scratch by reading the paper. At the end, we wasted 7 months trying all possible combinations. But we could not reproduce even a partial result. That is why I am stating that this paper is just fiction. It is an insult to the CVPR tradition. I still wonder how the authors we able to convince the CVPR reviewers.

njucckevin commented 3 years ago

Wait..@Doragd So you think the FlickrStyle10K(in fact, 7K) dataset is feasible for stylish image captioning, but the result in Stylenet is exaggerated? And by the way, what's the result in you picture? I have read MSCap, but there is no similar result.

Doragd commented 3 years ago

@njucckevin First of all, 7k data is somewhat feasible to train a model for stylized image captioning, but in my opinion, StyleNet which only depends on four stylized parameter matrixs cannot learn to express style, especially its strange training method. My result is a rather rough result, and I will refine it soon. You can feel free to contact me to obtain my refine version.

njucckevin commented 3 years ago

I got it. Thanks~

Cathyttt commented 3 years ago

@Doragd Hi, I'm also trying to reproduce the StyleNet model while after reading this issue I'm wondering if it's worthy to spend time on it. May I see your code and results about your version ? Thanks.

Doragd commented 3 years ago

@Doragd Hi, I'm also trying to reproduce the StyleNet model while after reading this issue I'm wondering if it's worthy to spend time on it. May I see your code and results about your version ? Thanks.

please contact me with e-mails in the next week

kacky24 / stylenet

StyleNet: Non Reproducible Results #9