Open hhyhhyhy opened 5 years ago
1: There is no way to share the features. It's about 2 TB, do you have any suggestions? you can download the image dataset, and extract by yourself.
2: Why do you think it's not comparable? What are the comparable ways? Thanks
Hi, could you please release the features from a small sampled set of Conceptual Caption images? It will be much helpful for us to check the correctness of the computed features. Thank you!
Could you release Conceptual Caption features? These features maybe so heavy to upload. But I really want to retrain based on your code.
By the way I have a question about your number of streams study. For your two-stream version, I found text stream used 12 bert layers and image stream used 6 image bert layers. These two streams would pass through a connection module with 6 layers. For the single stream version, two streams would share 12 bert layers for encoding. I don't think these two models are comparable.
Thanks a lot!