Tips for tuning hyperparameters

HHenryD commented 2 years ago

Hi, I followed your instructions in A. 11 to tune the hyperparameters to train STEGO on a custom dataset, but it's just a little difficult to find hyperparameters that give balanced positive and negative signals. May I have some more tips for tuning the hyperparameters? A screenshot of distributions of inter_cd, intra_cd, and neg_cd are shown below.

Thanks in advance!

mhamilton723 commented 2 years ago

Hey @HHenryD thanks for reaching out! The intra-cd looks a little on the high side so that might be one thing to push apart a little more. In some sense the intra- inter- and negative-cd's are measuring the average cosine similarity between image points and itself, image points and points in a KNN, and image points and points in a negative image respectively. You'll want this to roughly look like what you would expect if you were comparing actual labels instead of features. AKA yuou should have most things in the negative term be "different labels" and some things in the positive terms be the "same labels". That being said it can vary by dataset, for example every image in cityscapes has street pixels so the negative term still has some features coupling together. Also definitely take a look at the cluster viz and the histograms on the clusters to make sure none are dying during training., These are in the summaries in tensorboard

HHenryD commented 2 years ago

Thanks for your reply! This helps my understanding a lot. Just to confirm, is the distribution you showed in Figure 14 in A.11 distribution of inter_cd? So, does this mean that we want to see negative signals in neg_cd distribution, positive signals in intra_cd distribution, and balanced signals in inter_cd distribution like Figure 14 in A. 11? Also, when we want to adjust the parameters (pos and neg inter_weight, intra_weight, and shifts), what is the best way to tune them? In the paper, you mentioned that given a reasonable balance of lambda, tuning b can achieve the desired balance. I wonder how can we assess whether lambda is of reasonable balance?

For histograms on the clusters, what do you mean by dying? The images attached in my last question are found under "Histograms" in tensorboard. "Distributions" in tensorboard gives following plots:

Thanks again!

mhamilton723 commented 2 years ago

What kind of dataset are you working with? a little human intuition can sometimes help when trying to golf these things lol.

In general i think its ok to keep lambdas as balance as in paper and focus on b's. Its the most direct knob you have on the balancing of positive and negative signal.

Also could you share in the visual style first posted. The second set of visuals isnt great for bimodal distributions which is a definite possibility with these score distributions.

BradNeuberg commented 2 years ago

I'm having trouble reproducing the Potsdam results (https://github.com/mhamilton723/STEGO/issues/39), so I thought I'd do a successful cocostuff27 run to see what the logs and graphs of a successful training run look like. I thought this would be useful for others as well, to compare against their own training plots.

Here are the histograms of the successful cocostuff27 run. You can see they follow the general patterns that Mark identifies above and in his Appendix 11 in the paper:

Here's what the CD plots look like; neg_inter mostly takes a random walk while the others curve upwards:

Here's what the losses look like; you can see that all loss terms are falling:

Finally, here's what successful 'test' plots look like, with both the cluster and linear probe numbers rising:

BradNeuberg commented 2 years ago

For reference, here's the images from the Cocostuff27 training run above from tensorboard. It's interesting to note that sometimes the class colors of the cluster probe can be quite different than the linear probe and ground truth labels; I've noticed that myself in training runs on my own data: individualImage (5)

josht000 commented 2 years ago

@BradNeuberg Heyy, thanks for sharing the output of your runs, but I'm still having issues with the cocostuff27 data. Would you mind dropping in your train_config.yml? Also, are you running with 1 GPU or multi?

BradNeuberg commented 2 years ago

Just one GPU, I don’t have the config anymore unfortunately

On Wed, Sep 14, 2022 at 2:34 PM josht000 @.***> wrote:

@BradNeuberg https://github.com/BradNeuberg Heyy, thanks for sharing the output of your runs, but I'm still having issues with the cocostuff27 data. Would you mind dropping in your train_config.yml? Also, are you running with 1 GPU or multi?

— Reply to this email directly, view it on GitHub https://github.com/mhamilton723/STEGO/issues/23#issuecomment-1247325027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEHUOO2EDDXASPV6CSPILV6JAGZANCNFSM5X5BYZYQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Brad Neuberg Website: http://codinginparadise.org

mhamilton723 / STEGO

Tips for tuning hyperparameters #23