Closed tangyuhao2016 closed 3 years ago
I am trying to upload the data to Baidu drive. But it might take a long time. For your low val scores, have you checked the generations? Why are the scores so low? Maybe they are bad generations, like in some cases I met, the generations are all the same even for different images.
For the discrepancy among the datasets, you are right. But I am using the smaller number here 99946 instead of 99981. For now it does not introduce any problems. I will fix this error later.
I can't install baidu drive on my linux computer. If you have all the data, then I will not install it. If not I may try other methods.
For the missing files you mentioned, they are just paths. You can create your own and put the data there. The checkpoint is only used for test or resuming your training.
data_folder: /home/xuewyang/Xuewen/Research/data/FACAD/jsons model_folder: /home/xuewyang/Xuewen/Research/model/fashion/captioning/SAT2 checkpoint: /home/xuewyang/Xuewen/Research/model/fashion/captioning/SAT/vanilla/BEST_checkpoint_2.pth.tar
I am trying to upload the data to Baidu drive. But it might take a long time. For your low val scores, have you checked the generations? Why are the scores so low? Maybe they are bad generations, like in some cases I met, the generations are all the same even for different images.
I check the generations.The generations are bad, it is hard to read and some of them are the same. It is very strange for me. I use the code from video caption which is very simple structure. The encoder and decoder is lstm. When using video data, the result is good. I only remove the encoder , cat the image feature with embeddings of word as input for lstm, the result is really bad. And when using greedy, the model can generate bad generations. When using beam search, the model's generations are all
For the discrepancy among the datasets, you are right. But I am using the smaller number here 99946 instead of 99981. For now it does not introduce any problems. I will fix this error later.
I can't install baidu drive on my linux computer. If you have all the data, then I will not install it. If not I may try other methods.
I check the val, the num of caption and image are also different. You use "the smaller number here 99946 instead of 99981", how to make the labels and features align, since the number of them are not the same.
Now I have download the train.hdf5、val.hdf5、test.hdf5. If the problems of labels and features align can be solved, I will use the hdf5 format data. I have download the first color images from the "meta_all_129927.json", about 725000 images and are lack of about 300000 images of other colors. Some images do not have links in the json, so I copy other pictures of one product to replace, the number is about 1200. And 3 product do not have any links.The data processing is difficult.
If it's convenient for you, may you upload all the images and corresponding id from "meta_all_129927.json" to google cloud disk? I don't know if it's time-consuming for you. I know that the speed of uploading data to baidu cloud disk in China is much faster than that in google cloud disk. In the United States, may be it is the opposite. I'll check my code. Before I do that, I hope to make sure the data is correct and align the image with the caption.
For the discrepancy among the datasets, you are right. But I am using the smaller number here 99946 instead of 99981. For now it does not introduce any problems. I will fix this error later. I can't install baidu drive on my linux computer. If you have all the data, then I will not install it. If not I may try other methods.
I check the val, the num of caption and image are also different. You use "the smaller number here 99946 instead of 99981", how to make the labels and features align, since the number of them are not the same.
Now I have download the train.hdf5、val.hdf5、test.hdf5. If the problems of labels and features align can be solved, I will use the hdf5 format data. I have download the first color images from the "meta_all_129927.json", about 725000 images and are lack of about 300000 images of other colors. Some images do not have links in the json, so I copy other pictures of one product to replace, the number is about 1200. And 3 product do not have any links.The data processing is difficult.
If it's convenient for you, may you upload all the images and corresponding id from "meta_all_129927.json" to google cloud disk? I don't know if it's time-consuming for you. I know that the speed of uploading data to baidu cloud disk in China is much faster than that in google cloud disk. In the United States, may be it is the opposite. I'll check my code. Before I do that, I hope to make sure the data is correct and align the image with the caption.
I will try to align them soon.
I am trying to upload the data to Baidu drive. But it might take a long time. For your low val scores, have you checked the generations? Why are the scores so low? Maybe they are bad generations, like in some cases I met, the generations are all the same even for different images.
I check the generations.The generations are bad, it is hard to read and some of them are the same. It is very strange for me. I use the code from video caption which is very simple structure. The encoder and decoder is lstm. When using video data, the result is good. I only remove the encoder , cat the image feature with embeddings of word as input for lstm, the result is really bad. And when using greedy, the model can generate bad generations. When using beam search, the model's generations are all .
Strange things like these happened to me too.
When using beam search, the model's generations are all .
When using beam search, the model's generations are all
For the discrepancy among the datasets, you are right. But I am using the smaller number here 99946 instead of 99981. For now it does not introduce any problems. I will fix this error later. I can't install baidu drive on my linux computer. If you have all the data, then I will not install it. If not I may try other methods.
I check the val, the num of caption and image are also different. You use "the smaller number here 99946 instead of 99981", how to make the labels and features align, since the number of them are not the same. Now I have download the train.hdf5、val.hdf5、test.hdf5. If the problems of labels and features align can be solved, I will use the hdf5 format data. I have download the first color images from the "meta_all_129927.json", about 725000 images and are lack of about 300000 images of other colors. Some images do not have links in the json, so I copy other pictures of one product to replace, the number is about 1200. And 3 product do not have any links.The data processing is difficult. If it's convenient for you, may you upload all the images and corresponding id from "meta_all_129927.json" to google cloud disk? I don't know if it's time-consuming for you. I know that the speed of uploading data to baidu cloud disk in China is much faster than that in google cloud disk. In the United States, may be it is the opposite. I'll check my code. Before I do that, I hope to make sure the data is correct and align the image with the caption.
I will try to align them soon.
Thank you very much
When I use the luotian's code, my result is really bad. I process the data as the same of COCO format.I use the base model --caption_model newfc.
val result: After 2 epoch: Bleu_1: 0.104 Bleu_2: 0.023 Bleu_3: 0.005 Bleu_4: 0.002 computing METEOR score... METEOR: 0.029 computing Rouge score... ROUGE_L: 0.088 computing CIDEr score... CIDEr: 0.027 computing SPICE score.
I use 650000 images as train, 20000 as val, 30000 as test. Some generations as follow, many of them are similar and same.
image 660030: a classic straight leg cut make these stretch denim jeans a modern casual look image 660031: a signature crystal encrusted eagle head embellishes the geo quilted flap of a structured shoulder bag crafted from supple leather image 660032: a signature crystal encrusted eagle head embellishes the geo quilted flap of a structured shoulder bag crafted from supple leather image 660033: a signature crystal encrusted eagle head embellishes the geo quilted flap of a structured shoulder bag crafted from supple leather image 660034: a classic crewneck t tee is cut from soft cotton jersey image 660035: a classic triangle bikini top feature a flattering high cut that s perfect for your next beach getaway image 660036: a classic triangle bikini top is cut from a soft and stretchy fabric that feel great against your skin and image 660037: a sleek and stretchy thongs sweetens the look of these high waist bikini bottom image 660038: a classic triangle bikini top is cut from a soft stretchy fabric that feel great against your skin and dry image 660039: a signature crystal encrusted eagle head embellishes the geo quilted flap of a structured shoulder bag crafted from supple leather image 660040: a signature crystal encrusted eagle head embellishes the geo quilted flap of a structured shoulder bag crafted from supple leather image 660041: a classic triangle bikini top in a vibrant hue is cut from a soft and stretchy fabric that adapts to image 660042: a classic triangle bikini top is cut from a soft stretchy fabric that feel great against your skin and dry image 660043: a signature crystal encrusted eagle head embellishes the geo quilted flap of a structured shoulder bag crafted from supple leather image 660044: a vibrant print brightens this flattering wrap dress styled with a dipped neckline and a waist defining tie image 660045: a classic triangle bikini top feature a flattering high cut that s perfect for your next tropical getaway image 660046: a classic triangle bikini top is designed with a reversible design that s perfect for lounging poolside image 660047: a sleek and stretchy thongs sweetens the look of these high waist bikini bottom image 660048: a signature logo brand the chest of a classic crewneck t tee cut from soft cotton
I do not know where is the problem. May be the feature has something wrong. I employ resnet101 and resize the images into 112*112 to extract features. I want to know how would you extract features, original images or resize into some scale?
It's very strange to me.
I also met the same problem when I used his repo. I tried several days to figure it out but no avail. As you said, maybe the image features are wrong.
I also met the same problem when I used his repo. I tried several days to figure it out but no avail. As you said, maybe the image features are wrong. I'm re extracting features I find your result is pretty good. ..................................................... After epoch 1: Bleu_1: 0.433 Bleu_2: 0.270 Bleu_3: 0.181 Bleu_4: 0.131 computing METEOR score... METEOR: 0.173 computing Rouge score... ROUGE_L: 0.404 computing CIDEr score... CIDEr: 1.222
I think the reason might be 1. I am using ruotian's evaluation codes. 2. I am using more examples. (The same number with ECCV but for ECCV I used only a subset actually to report numbers because of time limit) .............................................................
For the first reason, Is it because of using ruotian's code which makes your result better? Do you have any changes? For the second reason, what is the meaning of using more examples? more val?
I'm actually using the code in this repo, instead of the raw code of ruotian. I've also tried ruotian's code before but failed with some strange problems.
I'm actually using the code in this repo, instead of the raw code of ruotian. I've also tried ruotian's code before but failed with some strange problems.
In this repo, the author use the hdf5 formart data which is the image and use resnet to fintune. May you use the ruotian's code to process data?
I'm not using the hdf5 file but the images I dowloaded with wget. And I didn't do much preprocessing. I simply divided the data into 3 splits and get 3 files: FACAD_trian.json, FACAD_val.json and FACAD_test.json respectively. Format of each json file is exactly the same as the raw COCO dataset. Then I used these json files and images to train and validate. No extral processing is engaged, I think.
I'm not using the hdf5 file but the images I dowloaded with wget. And I didn't do much preprocessing. I simply divided the data into 3 splits and get 3 files: FACAD_trian.json, FACAD_val.json and FACAD_test.json respectively. Format of each json file is exactly the same as the raw COCO dataset. Then I used these json files and images to train and validate. No extral processing is engaged, I think.
I have uploaded the evalcap folder for evaluation. See the Codes section in the README.
I'm not using the hdf5 file but the images I dowloaded with wget. And I didn't do much preprocessing. I simply divided the data into 3 splits and get 3 files: FACAD_trian.json, FACAD_val.json and FACAD_test.json respectively. Format of each json file is exactly the same as the raw COCO dataset. Then I used these json files and images to train and validate. No extral processing is engaged, I think.
Excuse me, I want to ask you a question. When I use the ruotian's code of att2in model. I try two ways. First, I extract the fc and att features and then send data into model. The result is as follow: Bleu_1:23.1 Bleu_2:10.7 Bleu_3:5.8 Bleu_4:3.7 Meteor: 9.1 Rouge:19.8 Cider:35.1 The result is lower than yours, I find you use raw images and fintune cnn without extracting futures. May be different data split will cause different problems. I follow author's split way, I download the json from author's google cloud disk and upload into baidu cloud disk. May you use the author's split way to split data into train, val, test?
link:https://pan.baidu.com/s/1oph4CZNsOgPTFLv8chQNIQ code:sr21
But I do not think different data split will be the core reason of low performance.
So then, I fintune cnn and the result is too low , the cider is only 0.5 and many almost all generations are the same. I use author's fintune code, it is very strange to me.
How do you revise fintune code based on ruotian's code? May you open the fintune's code. An unified baseline may be vital for this task, thank you.
I'm not using the hdf5 file but the images I dowloaded with wget. And I didn't do much preprocessing. I simply divided the data into 3 splits and get 3 files: FACAD_trian.json, FACAD_val.json and FACAD_test.json respectively. Format of each json file is exactly the same as the raw COCO dataset. Then I used these json files and images to train and validate. No extral processing is engaged, I think.
I have uploaded the evalcap folder for evaluation. See the Codes section in the README.
Sir, when you use ruotian's code, which way of data process you choose? Extract fc and att features or fintune cnn used raw images? When I extract features and send feats into 'att2in' model based on ruotian's code, the val result like cider is only 35.1, and adaatt's performance is lower. Some generations are the same, but some of them are different. But when I revise ruotian's dataloader code based on your fintune's code, the result is too low and almost all generations are the same. It is very strange to me.
I'm not using the hdf5 file but the images I dowloaded with wget. And I didn't do much preprocessing. I simply divided the data into 3 splits and get 3 files: FACAD_trian.json, FACAD_val.json and FACAD_test.json respectively. Format of each json file is exactly the same as the raw COCO dataset. Then I used these json files and images to train and validate. No extral processing is engaged, I think.
Excuse me, I want to ask you a question. When I use the ruotian's code of att2in model. I try two ways. First, I extract the fc and att features and then send data into model. The result is as follow: Bleu_1:23.1 Bleu_2:10.7 Bleu_3:5.8 Bleu_4:3.7 Meteor: 9.1 Rouge:19.8 Cider:35.1 The result is lower than yours, I find you use raw images and fintune cnn without extracting futures. May be different data split will cause different problems. I follow author's split way, I download the json from author's google cloud disk and upload into baidu cloud disk. May you use the author's split way to split data into train, val, test?
link:https://pan.baidu.com/s/1oph4CZNsOgPTFLv8chQNIQ code:sr21
But I do not think different data split will be the core reason of low performance.
So then, I fintune cnn and the result is too low , the cider is only 0.5 and many almost all generations are the same. I use author's fintune code, it is very strange to me.
How do you revise fintune code based on ruotian's code? May you open the fintune's code. An unified baseline may be vital for this task, thank you.
I think I made a mistake in testing. I didn't use sampling, but just use forward. See here. So I am actually using teacher force. Thus, I think having low performance is normal. Having a high one is wrong.
I also tried his codes, I extracted features first and didn't do finetuning. I had bad results too. As you mentioned before, there is a mis-alignment problem in the dataset. So I am re-saving the hdf5 dataset. I will finish it and let you know how I did it in two days maybe. FYI, if you split yourself, make sure that the split is done by image id, a key 'id' in json file, not by the total number of images. Remember, there are about 130K different fashion items, but about 1M different images. Make sure the data in test is unseen in training phase.
Thank you. Like the VAL_IMAGE_5.json as follows, 116112, 121024,117737... are the image id. To be fair, I absolutely follow the id to split the data.
['/home/xuewyang/Xuewen/Research/data/FACAD/images/116112/Black_-_Only_1left/0.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/116112/Black-_Only_1left/1.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/116112/Black-_Only_1left/2.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/116112/Black-_Only_1_left/3.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/121024/Yellow_Vermeil/0.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/0.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/1.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/2.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/3.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/4.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/5.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/6.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/0.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/1.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/2.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/3.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/4.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/5.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/6.jpeg', ..... I think another reason may be the mis-alignment problem. I can not check if the hdf5 format of image data are aligned with the caption. A unified data partition like coco is important for this meaningful task I think. Looking forward to your reply.
Thank you. Like the VAL_IMAGE_5.json as follows, 116112, 121024,117737... are the image id. To be fair, I absolutely follow the id to split the data.
['/home/xuewyang/Xuewen/Research/data/FACAD/images/116112/Black_-_Only_1left/0.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/116112/Black-_Only_1left/1.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/116112/Black-_Only_1left/2.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/116112/Black-_Only_1_left/3.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/121024/Yellow_Vermeil/0.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/0.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/1.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/2.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/3.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/4.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/5.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/Black/6.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/0.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/1.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/2.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/3.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/4.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/5.jpeg', '/home/xuewyang/Xuewen/Research/data/FACAD/images/117737/BrownAztec-_Only_2_left/6.jpeg', ..... I think another reason may be the mis-alignment problem. I can not check if the hdf5 format of image data are aligned with the caption. A unified data partition like coco is important for this meaningful task I think. Looking forward to your reply.
I will make a unified partition soon and manually check some of them to make sure they are well aligned.
I exactly split the data according toTRAIN_IMAGES_5.hdf5, VAL_IMAGES_5.hdf5 and TEST_IMAGES_5.hdf5. Maybe you just need to train for more epochs. I trained the model for about 20 epochs. The CIDEr tends to increase with more epochs of training.
I exactly split the data according toTRAIN_IMAGES_5.hdf5, VAL_IMAGES_5.hdf5 and TEST_IMAGES_5.hdf5. Maybe you just need to train for more epochs. I trained the model for about 20 epochs. The CIDEr tends to increase with more epochs of training.
There might be a mis-alignment problem in the dataset, i.e., the number of images != the # of captions. I have fixed the problem. But it might take like two or three days to upload to the drive.
I exactly split the data according toTRAIN_IMAGES_5.hdf5, VAL_IMAGES_5.hdf5 and TEST_IMAGES_5.hdf5. Maybe you just need to train for more epochs. I trained the model for about 20 epochs. The CIDEr tends to increase with more epochs of training.
There might be a mis-alignment problem in the dataset, i.e., the number of images != the # of captions. I have fixed the problem. But it might take like two or three days to upload to the drive.
Sorry, I've mixed up the filenames. Actually, I split the data according to TRAIN_IMGPATH_5.hdf5, VAL_IMGPATH_5.hdf5 and TEST_IMGPATH_5.hdf5. Since I've downloaded images with urls, I don't use XXX_IMAGES_5.hdf5 in my experiment.
Sir, thank you very much for publicing the code.
Recently I have download all items with the first color from FACAD. The json format of dataset is named "meta_all_129927.json".
When I check the download data, I get 126753 items which are the first color data in FACAD. There are about 1200 images which are lack of link (basically, one product lacks one link) and 3 items do not have any links. Data processing is very troublesome
I use the data of 100000 images, the result is too low and data process must have some problems
Results of the val(use the train data to evaluate): B4: 0.09 M: 0.07 R: 0.11 C: 0.23
So I download the data named 'TEST_IMAGES_5.hdf5,VAL_IMAGES_5.hdf5,TRAIN_IMAGES_5.hdf5' you provide.
For example, the num of images in TEST_IMAGES_5.HDF5 and TEST_IMAGEPATH_5.json are 99981, however, the other file with json format like 'TEST_CAPLENS_5.json', 'TEST_CAPTIONS_5.json' ... are 99946. I only check the test data.
May you check the data upload? I use the TEST_IMAGEPATH_5.json to find image id and get the description, attribute.... I do not know this way is right.
Today, I download the code, find some files is lack in yml such as sat.yml: data_folder: /home/xuewyang/Xuewen/Research/data/FACAD/jsons model_folder: /home/xuewyang/Xuewen/Research/model/fashion/captioning/SAT2 checkpoint: /home/xuewyang/Xuewen/Research/model/fashion/captioning/SAT/vanilla/BEST_checkpoint_2.pth.tar
I sent a few emails, and I got no reply. I am very, very interested in fashion caption and hope to get your help