guanghuixu / AnchorCaptioner

Other
32 stars 10 forks source link

Questions about feature extraction #10

Closed Caroline0728 closed 2 years ago

Caroline0728 commented 2 years ago

Dear author, I am still studying the code you provided recently. I would like to ask you to answer the following questions. Thank you very much!

  1. Does the method of feature extraction in this paper provide the corresponding code?Because the download link you provided is the file format of Npy after features have been extracted, it seems that one image corresponds to two files,.npy and info.npy. What file is info.npy?
  2. Are all the modules used in this paper in the Pythia package moduls?
guanghuixu commented 2 years ago

1) For the first question, the download link (I provided) is corresponding the the imdb_files, which is different with the above files (xxx.npy and xxx_info.npy) you mention. image Note that, the xxx.npy and xxx_info.npy correspond to the image/ocr feature embedding and its annotation, you can download the original file to check the content. The download links can be found in the M4C repo image Finally, if you want to add another method to extract features, you can refer to the original process method provided by Pythia/M4C. Also, you can prepare the files by checking the key-values of xxx.npy and xxx_info.npy yourself. 2) Sorry, I don’t quite understand what the second question means. There is an inaccurate statement, pythia can be understood as the early MMF framework. In this sense, our model is indeed based on pythia or MMF.

Caroline0728 commented 2 years ago

好像可以直接说中文,感谢您的回复。 是的,我打算用其他的方法来提取特征。请问 ImDB文件夹中是Textcaps数据集中的图像信息嘛?“Also, you can prepare the files by checking the key-values of and yourself.xxx.npyxxx_info.npyxxx.npyxxx_info.npyxxx.npyxxx_info.npy”请问这句话是什么意思?

guanghuixu commented 2 years ago

1) ImDB文件夹里面放的是训练所需要的信息(比如这张图片对应的caption),我不确定跟你说的"图像信息"是否一致。 2) 以ocr特征为例,xxx.npy存储的是ocr的feature embedding,xxx_info.npy存储的是图片中每个ocr的bbox、对应字符等 方式1:可以参考上面提取特征的链接,将提取模型替换成你的即可生成需要的xxx.npyxxx_info.npy;这是M4C的处理方式,这在它的ReadMe里面明确写了 方式2:但我更推荐你去下载原始的数据集,看看文件里面的键值;然后直接生成这些需要的npy文件,这是更方便也更简单的做法 原来的xxx.npyxxx_info.npy是M4C跟TextVQA/TextCaps challenge直接提供的,因为我没有试过更换数据集也没有更换过提取方法,所以我不太确定方式1的可行性,所以我更推荐方式2

Caroline0728 commented 2 years ago

非常感谢您的回复,对我帮助很大

Caroline0728 commented 2 years ago

您好,再次打扰,请问实验经过评估或测试生成的josn文件,我们如何可视化得到原始图片格式呢?目前的json格式包含了图片的id和caption的id,因为我们这里的data是提取特征后的.npy格式,怎么通过可视化直观的看到图片格式,对应其描述语句。谢谢

guanghuixu commented 2 years ago

根据图片id找到原始图片,再自行写个可视化函数即可? 所以真正的问题是想问原始图片在哪吧? 这是原始数据集:https://textvqa.org/textcaps/dataset/ textcap跟textvqa用的都是open image数据集,如果有图片id在第一个链接找不到的(大概率不会发生),可以把open image全下载下来看看