singhgautam / sysbinder

Official Code for Neural Systematic Binder
MIT License
29 stars 2 forks source link

Difficulty Reproducing Results on CLEVR-Easy Dataset #3

Open sanky29 opened 2 months ago

sanky29 commented 2 months ago

Hello, After training the model for 200K steps, as mentioned in the paper, across two different runs, we were unable to reproduce the numbers reported in the paper for Clevr-Easy dataset. We are attaching the numbers obtained from both runs for your reference.

DCI Disentanglement: mean 0.503, std 0.0017 Completeness: mean 0.4482, std 0.0098 Informativeness: mean 0.9687, std 0.0015

singhgautam commented 2 months ago

Hi Sanket,

Thank you for the query. The performance is susceptible to the training seed and it is advisable to train the model with 4 different seed values (but the more the better). In your reported results, the standard deviation seems a bit too low and my hunch is that enough seeds were perhaps not run for training.

Hope it helps.

sanky29 commented 2 months ago

Hi Gautam,

Thank you for your timely response and the helpful suggestion. We will proceed with training the models using different random seeds, as you recommended.

sanky29 commented 1 month ago

Hi Gautam,

After training the models using different random seeds, we obtained the following results:

DCI Disentanglement: mean 0.7673, std 0.0634 Completeness: mean 0.5569, std 0.1137 Informativeness: mean 0.97134, std 0.0045

While investigating further, we noticed two potential issues:

  1. The default num_slots in the evaluation script was set to 3, whereas it should have been set to 4 for all datasets.
  2. We found that the attributes of backgrounds were being used for training probes, which seems incorrect as backgrounds do not have attributes.

After correcting these two aspects, we reevaluated the models and obtained the following updated results:

DCI Disentanglement: mean 0.8204, std 0.0291 Completeness: mean 0.57096, std 0.1079 Informativeness: mean 0.9646, std 0.0038

We would appreciate any further feedback or thoughts you might have on these observations.