Closed Caroline0728 closed 2 years ago
1) For the first question, the download link (I provided) is corresponding the the imdb_files, which is different with the above files (xxx.npy
and xxx_info.npy
) you mention.
Note that, the xxx.npy
and xxx_info.npy
correspond to the image/ocr feature embedding and its annotation, you can download the original file to check the content. The download links can be found in the M4C repo
Finally, if you want to add another method to extract features, you can refer to the original process method provided by Pythia/M4C. Also, you can prepare the files by checking the key-values of xxx.npy
and xxx_info.npy
yourself.
2) Sorry, I don’t quite understand what the second question means. There is an inaccurate statement, pythia can be understood as the early MMF framework. In this sense, our model is indeed based on pythia or MMF.
好像可以直接说中文,感谢您的回复。 是的,我打算用其他的方法来提取特征。请问 ImDB文件夹中是Textcaps数据集中的图像信息嘛?“Also, you can prepare the files by checking the key-values of and yourself.xxx.npyxxx_info.npyxxx.npyxxx_info.npyxxx.npyxxx_info.npy”请问这句话是什么意思?
1) ImDB文件夹里面放的是训练所需要的信息(比如这张图片对应的caption),我不确定跟你说的"图像信息"是否一致。
2) 以ocr特征为例,xxx.npy
存储的是ocr的feature embedding,xxx_info.npy
存储的是图片中每个ocr的bbox、对应字符等
方式1:可以参考上面提取特征的链接,将提取模型替换成你的即可生成需要的xxx.npy
跟xxx_info.npy
;这是M4C的处理方式,这在它的ReadMe里面明确写了
方式2:但我更推荐你去下载原始的数据集,看看文件里面的键值;然后直接生成这些需要的npy文件,这是更方便也更简单的做法
原来的xxx.npy
跟xxx_info.npy
是M4C跟TextVQA/TextCaps challenge直接提供的,因为我没有试过更换数据集也没有更换过提取方法,所以我不太确定方式1的可行性,所以我更推荐方式2
非常感谢您的回复,对我帮助很大
您好,再次打扰,请问实验经过评估或测试生成的josn文件,我们如何可视化得到原始图片格式呢?目前的json格式包含了图片的id和caption的id,因为我们这里的data是提取特征后的.npy格式,怎么通过可视化直观的看到图片格式,对应其描述语句。谢谢
根据图片id找到原始图片,再自行写个可视化函数即可? 所以真正的问题是想问原始图片在哪吧? 这是原始数据集:https://textvqa.org/textcaps/dataset/ textcap跟textvqa用的都是open image数据集,如果有图片id在第一个链接找不到的(大概率不会发生),可以把open image全下载下来看看
Dear author, I am still studying the code you provided recently. I would like to ask you to answer the following questions. Thank you very much!