salesforce / PCL

PyTorch code for "Prototypical Contrastive Learning of Unsupervised Representations"
MIT License
570 stars 83 forks source link

About Loss of InfoNCE and Cluter_results #22

Open lsyysl9711 opened 1 year ago

lsyysl9711 commented 1 year ago

Hi,

  1. I notice that the labels created in InfoNCE loss is always a zero-vector:(https://github.com/salesforce/PCL/blob/964da1fb7c0546e8ce55627fa3c0debde4b7e456/pcl/builder.py#L163) I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?

  2. In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration: (https://github.com/salesforce/PCL/blob/964da1fb7c0546e8ce55627fa3c0debde4b7e456/main_pcl.py#L299) So what is the motivation behind this operation, I think we should run it on training set.

lerogo commented 10 months ago

Hi,

  1. I notice that the labels created in InfoNCE loss is always a zero-vector:(https://github.com/salesforce/PCL/blob/964da1fb7c0546e8ce55627fa3c0debde4b7e456/pcl/builder.py#L163 ) I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?
  2. In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration: (https://github.com/salesforce/PCL/blob/964da1fb7c0546e8ce55627fa3c0debde4b7e456/main_pcl.py#L299 ) So what is the motivation behind this operation, I think we should run it on training set.

Same question.

Volibear1234 commented 6 months ago

Hi,

  1. I notice that the labels created in InfoNCE loss is always a zero-vector:(https://github.com/salesforce/PCL/blob/964da1fb7c0546e8ce55627fa3c0debde4b7e456/pcl/builder.py#L163 ) I think this is wrong since otherwise the loss will always be zero. Did I mis-understand the codes?
  2. In creating the Custer_Result dictionary, I found that only eval dataset was involved into consideration: (https://github.com/salesforce/PCL/blob/964da1fb7c0546e8ce55627fa3c0debde4b7e456/main_pcl.py#L299 ) So what is the motivation behind this operation, I think we should run it on training set.

For the first question, you can refer to the moco_v1 code, where they use cross-entropy directly for InfoNCE. As for the second question, they use the eval_dataset as negative prototypes, and in the line of code:

output, target, output_proto, target_proto = model(im_q = images[0], im_k = images[1],
                                                   cluster_result = cluster_result, index = index)

the passed index is from the train_loader, so it still computes based on the train_dataset.