You mentioned ''we use an instance discrimination task as a pretext task" in your paper. I'm really confuse which task should be used. As I understand, you feed text images to encoder and train MOCO with only Resnet and then frezze resnet and then train with Bilstm and attention. beside that, I'm confuse with the idea that you train full TRBA with moco and I don't know the next step of 2 stages moco training methods. Feel free to correct me if i misunderstood.
pretrain ResNet part of TRBA with MoCo (use pretrain.py).
You can check the instance discrimination task from Algorithm1 in MoCo paper or just check our code self_supervised.py
train TRBA with initialization by the model pretrained in 1 (only ResNet part, use train.py).
You mentioned ''we use an instance discrimination task as a pretext task" in your paper. I'm really confuse which task should be used. As I understand, you feed text images to encoder and train MOCO with only Resnet and then frezze resnet and then train with Bilstm and attention. beside that, I'm confuse with the idea that you train full TRBA with moco and I don't know the next step of 2 stages moco training methods. Feel free to correct me if i misunderstood.